{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Feature engineering (preparación de variables)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. [Definicion](#1)\n", "2. [Imputación](#2)\n", "3. [Valores atípicos](#3)\n", "4. [Binning](#4)\n", "5. [Transformación logarítmica](#5)\n", "6. [One-hot encoding](#6)\n", "7. [Separación de valores](#7)\n", "8. [Ajuste de escala](#8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Definición\n", "\n", "\n", "\n", "__[What Is Feature Engineering](https://medium.com/mindorks/what-is-feature-engineering-for-machine-learning-d8ba3158d97a)__\n", "\n", "Proceso de aplicación del conocimiento de los datos de cierto ámbito/dominio para seleccionar o crear variables que mejoren el desempeño de los modelos predictivos. Se recomienda realizar luego del Análisis Exploratorio de Datos.\n", "\n", "## Técnicas\n", "\n", "- Imputación, manejo de valors faltantes (eliminar o encontrar un valor adecuado)\n", "- Manejo de valores atípicos, eliminarlos o preservarlos.\n", "- Binning, agrupar valores en clases típicamente para convertir variables contínuas en discretas.\n", "- Transformación logaritmica, para lidiar con distribuciones muy asimétricas\n", "- One-hot enconding, convertir variables nominales en 0s y 1s\n", "- Separación de valor (Feature Split), ej convertir nombre completo en nombre y apellido.\n", "- __[Ajuste de escala](https://en.wikipedia.org/wiki/Feature_scaling)__., para ubicar variables en rangos recomendados\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](./01-eda-visual-techniques.png)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.627501
11856629026.60.351310
28183640023.30.672321
318966239428.10.167210
40137403516843.12.288331
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50 1 \n", "1 0.351 31 0 \n", "2 0.672 32 1 \n", "3 0.167 21 0 \n", "4 2.288 33 1 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "df = pd.read_csv(os.path.join(\"./csv/diabetes.csv\"))\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imputación\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
count768.000000768.000000768.000000768.000000768.000000768.000000768.000000768.000000768.000000
mean3.845052120.89453169.10546920.53645879.79947931.9925780.47187633.2408850.348958
std3.36957831.97261819.35580715.952218115.2440027.8841600.33132911.7602320.476951
min0.0000000.0000000.0000000.0000000.0000000.0000000.07800021.0000000.000000
25%1.00000099.00000062.0000000.0000000.00000027.3000000.24375024.0000000.000000
50%3.000000117.00000072.00000023.00000030.50000032.0000000.37250029.0000000.000000
75%6.000000140.25000080.00000032.000000127.25000036.6000000.62625041.0000001.000000
max17.000000199.000000122.00000099.000000846.00000067.1000002.42000081.0000001.000000
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin \\\n", "count 768.000000 768.000000 768.000000 768.000000 768.000000 \n", "mean 3.845052 120.894531 69.105469 20.536458 79.799479 \n", "std 3.369578 31.972618 19.355807 15.952218 115.244002 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 1.000000 99.000000 62.000000 0.000000 0.000000 \n", "50% 3.000000 117.000000 72.000000 23.000000 30.500000 \n", "75% 6.000000 140.250000 80.000000 32.000000 127.250000 \n", "max 17.000000 199.000000 122.000000 99.000000 846.000000 \n", "\n", " BMI DiabetesPedigreeFunction Age Outcome \n", "count 768.000000 768.000000 768.000000 768.000000 \n", "mean 31.992578 0.471876 33.240885 0.348958 \n", "std 7.884160 0.331329 11.760232 0.476951 \n", "min 0.000000 0.078000 21.000000 0.000000 \n", "25% 27.300000 0.243750 24.000000 0.000000 \n", "50% 32.000000 0.372500 29.000000 0.000000 \n", "75% 36.600000 0.626250 41.000000 1.000000 \n", "max 67.100000 2.420000 81.000000 1.000000 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df.isnull()\n", "df.describe(include='all')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.62750.01
11856629026.60.35131.00
28183640023.30.67232.01
318966239428.10.167NaN0
40137403516843.12.28833.01
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50.0 1 \n", "1 0.351 31.0 0 \n", "2 0.672 32.0 1 \n", "3 0.167 NaN 0 \n", "4 2.288 33.0 1 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[3,'Age'] = np.nan\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
count768.000000768.000000768.000000768.000000768.000000768.000000768.000000767.000000768.000000
mean3.845052120.89453169.10546920.53645879.79947931.9925780.47187633.2568450.348958
std3.36957831.97261819.35580715.952218115.2440027.8841600.33132911.7595800.476951
min0.0000000.0000000.0000000.0000000.0000000.0000000.07800021.0000000.000000
25%1.00000099.00000062.0000000.0000000.00000027.3000000.24375024.0000000.000000
50%3.000000117.00000072.00000023.00000030.50000032.0000000.37250029.0000000.000000
75%6.000000140.25000080.00000032.000000127.25000036.6000000.62625041.0000001.000000
max17.000000199.000000122.00000099.000000846.00000067.1000002.42000081.0000001.000000
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin \\\n", "count 768.000000 768.000000 768.000000 768.000000 768.000000 \n", "mean 3.845052 120.894531 69.105469 20.536458 79.799479 \n", "std 3.369578 31.972618 19.355807 15.952218 115.244002 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 1.000000 99.000000 62.000000 0.000000 0.000000 \n", "50% 3.000000 117.000000 72.000000 23.000000 30.500000 \n", "75% 6.000000 140.250000 80.000000 32.000000 127.250000 \n", "max 17.000000 199.000000 122.000000 99.000000 846.000000 \n", "\n", " BMI DiabetesPedigreeFunction Age Outcome \n", "count 768.000000 768.000000 767.000000 768.000000 \n", "mean 31.992578 0.471876 33.256845 0.348958 \n", "std 7.884160 0.331329 11.759580 0.476951 \n", "min 0.000000 0.078000 21.000000 0.000000 \n", "25% 27.300000 0.243750 24.000000 0.000000 \n", "50% 32.000000 0.372500 29.000000 0.000000 \n", "75% 36.600000 0.626250 41.000000 1.000000 \n", "max 67.100000 2.420000 81.000000 1.000000 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe(include='all')" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "#df['Age'].isnull()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
318966239428.10.167NaN0
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "3 1 89 66 23 94 28.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "3 0.167 NaN 0 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['Age'].isnull()]" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(768, 9)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(768, 9)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Eliminación de valores faltantes\n", "df.dropna(how='all').shape" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(767, 9)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dropna(subset=['Insulin', 'Age'], how='any').shape" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.62750.01
11856629026.60.35131.00
28183640023.30.67232.01
318966239428.10.167NaN0
40137403516843.12.28833.01
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50.0 1 \n", "1 0.351 31.0 0 \n", "2 0.672 32.0 1 \n", "3 0.167 NaN 0 \n", "4 2.288 33.0 1 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.62750.01
11856629026.60.35131.00
28183640023.30.67232.01
40137403516843.12.28833.01
55116740025.60.20130.00
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "4 0 137 40 35 168 43.1 \n", "5 5 116 74 0 0 25.6 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50.0 1 \n", "1 0.351 31.0 0 \n", "2 0.672 32.0 1 \n", "4 2.288 33.0 1 \n", "5 0.201 30.0 0 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dropna(subset=['Insulin', 'Age'], how='any', inplace=True )\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.627501
11856629026.60.351310
28183640023.30.672321
318966239428.10.167210
40137403516843.12.288331
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50 1 \n", "1 0.351 31 0 \n", "2 0.672 32 1 \n", "3 0.167 21 0 \n", "4 2.288 33 1 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Asignación de valores\n", "df = pd.read_csv(os.path.join(\"diabetes.csv\"))\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "df.loc[3,'Age'] = np.nan" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.62750.01
11856629026.60.35131.00
28183640023.30.67232.01
318966239428.10.167NaN0
40137403516843.12.28833.01
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50.0 1 \n", "1 0.351 31.0 0 \n", "2 0.672 32.0 1 \n", "3 0.167 NaN 0 \n", "4 2.288 33.0 1 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.62750.01
11856629026.60.35131.00
28183640023.30.67232.01
318966239428.10.16733.00
40137403516843.12.28833.01
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50.0 1 \n", "1 0.351 31.0 0 \n", "2 0.672 32.0 1 \n", "3 0.167 33.0 0 \n", "4 2.288 33.0 1 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df['Age'].fillna(0, inplace=True) #Casi nunca es buena idea!\n", "df['Age'].fillna(round(df['Age'].mean()), inplace=True) #Pocas veces es buena idea!\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(os.path.join(\"diabetes.csv\"))\n", "df.loc[3,'Age'] = np.nan" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.62750.01
11856629026.60.35131.00
28183640023.30.67232.01
318966239428.10.167NaN0
40137403516843.12.28833.01
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50.0 1 \n", "1 0.351 31.0 0 \n", "2 0.672 32.0 1 \n", "3 0.167 NaN 0 \n", "4 2.288 33.0 1 " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()\n", "#df.shape" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "#df.loc[df['Age'].notnull(),].head()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "por_embarazos = df.groupby('Pregnancies')\n", "por_embarazos" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: Int64Index([ 4, 16, 45, 57, 58, 59, 66, 78, 83, 102,\n", " ...\n", " 649, 677, 681, 682, 697, 713, 727, 736, 753, 757],\n", " dtype='int64', length=111),\n", " 1: Int64Index([ 1, 3, 13, 18, 19, 27, 46, 50, 51, 55,\n", " ...\n", " 726, 739, 742, 746, 747, 751, 755, 758, 766, 767],\n", " dtype='int64', length=135),\n", " 2: Int64Index([ 8, 38, 47, 60, 63, 67, 70, 79, 81, 85,\n", " ...\n", " 707, 709, 728, 729, 732, 733, 734, 738, 760, 764],\n", " dtype='int64', length=103),\n", " 3: Int64Index([ 6, 20, 31, 32, 40, 80, 108, 110, 126, 132, 140, 166, 169,\n", " 190, 197, 227, 234, 242, 256, 260, 261, 263, 272, 304, 313, 316,\n", " 317, 318, 321, 347, 348, 352, 354, 368, 370, 389, 396, 398, 399,\n", " 415, 419, 431, 480, 494, 501, 504, 514, 515, 521, 524, 525, 527,\n", " 539, 541, 551, 570, 572, 588, 592, 610, 611, 615, 644, 659, 673,\n", " 678, 686, 696, 710, 714, 716, 730, 741, 748, 752],\n", " dtype='int64'),\n", " 4: Int64Index([ 10, 35, 39, 69, 73, 91, 93, 107, 113, 115, 118, 119, 130,\n", " 144, 151, 160, 167, 168, 184, 198, 199, 228, 230, 233, 235, 241,\n", " 262, 264, 288, 320, 350, 351, 363, 364, 378, 393, 394, 400, 406,\n", " 417, 425, 442, 444, 474, 479, 482, 488, 492, 493, 535, 543, 547,\n", " 549, 568, 604, 625, 629, 641, 643, 666, 683, 698, 699, 704, 720,\n", " 725, 735, 750],\n", " dtype='int64'),\n", " 5: Int64Index([ 5, 14, 29, 30, 52, 62, 65, 71, 77, 84, 116, 117, 123,\n", " 139, 141, 148, 178, 179, 183, 189, 195, 205, 207, 216, 218, 219,\n", " 265, 278, 286, 289, 302, 303, 337, 343, 349, 360, 361, 362, 365,\n", " 386, 388, 391, 402, 404, 437, 457, 463, 496, 546, 628, 636, 652,\n", " 684, 711, 719, 723, 765],\n", " dtype='int64'),\n", " 6: Int64Index([ 0, 33, 95, 98, 121, 165, 170, 171, 176, 180, 204, 217, 231,\n", " 243, 295, 310, 319, 329, 366, 401, 410, 439, 469, 495, 499, 502,\n", " 519, 522, 533, 552, 560, 563, 567, 576, 581, 587, 594, 601, 613,\n", " 616, 622, 642, 664, 668, 670, 675, 701, 705, 749, 759],\n", " dtype='int64'),\n", " 7: Int64Index([ 15, 17, 22, 26, 41, 42, 44, 48, 49, 54, 56, 64, 76,\n", " 82, 92, 114, 155, 161, 185, 192, 209, 212, 222, 223, 236, 276,\n", " 282, 283, 285, 314, 339, 473, 477, 498, 503, 517, 555, 603, 612,\n", " 630, 638, 693, 695, 715, 756],\n", " dtype='int64'),\n", " 8: Int64Index([ 2, 9, 21, 53, 61, 111, 133, 154, 175, 186, 188, 194, 206,\n", " 299, 330, 344, 345, 387, 408, 424, 443, 462, 468, 478, 489, 509,\n", " 540, 545, 557, 583, 584, 586, 662, 674, 690, 731, 737, 754],\n", " dtype='int64'),\n", " 9: Int64Index([ 23, 37, 43, 131, 146, 152, 191, 214, 238, 245, 248, 250, 338,\n", " 355, 403, 459, 460, 512, 516, 523, 618, 663, 669, 676, 708, 743,\n", " 761, 762],\n", " dtype='int64'),\n", " 10: Int64Index([ 7, 11, 12, 25, 34, 143, 246, 270, 281, 306, 327, 458, 464,\n", " 505, 542, 578, 634, 660, 667, 672, 706, 712, 717, 763],\n", " dtype='int64'),\n", " 11: Int64Index([24, 36, 193, 259, 558, 559, 590, 614, 648, 658, 740], dtype='int64'),\n", " 12: Int64Index([215, 254, 333, 358, 375, 436, 510, 582, 745], dtype='int64'),\n", " 13: Int64Index([28, 72, 86, 274, 323, 357, 518, 635, 691, 744], dtype='int64'),\n", " 14: Int64Index([298, 455], dtype='int64'),\n", " 15: Int64Index([88], dtype='int64'),\n", " 17: Int64Index([159], dtype='int64')}" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "por_embarazos.groups" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Se recomienda emplear la métrica de tendencia central que sea menos afectada por valores atípicos:\n", "\n", "**La Mediana.**" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Age
Pregnancies
025.0
124.0
225.0
327.0
430.0
536.0
636.5
741.0
843.0
944.0
1040.5
1145.0
1246.0
1343.5
1442.0
1543.0
1747.0
\n", "
" ], "text/plain": [ " Age\n", "Pregnancies \n", "0 25.0\n", "1 24.0\n", "2 25.0\n", "3 27.0\n", "4 30.0\n", "5 36.0\n", "6 36.5\n", "7 41.0\n", "8 43.0\n", "9 44.0\n", "10 40.5\n", "11 45.0\n", "12 46.0\n", "13 43.5\n", "14 42.0\n", "15 43.0\n", "17 47.0" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#por_embarazos.agg({'Age': ['mean','median']})\n", "por_embarazos.agg({'Age': 'median'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### El gráfico de caja muestra la media o mediana?" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXsAAAEcCAYAAAAmzxTpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3dfXxcZZn/8c9lU1JoSwsotaVAUUDT1JVSFrVWJURBEBUfQAO4ZRup5bdWV9htusQV2SVC2QLLsrsgGKQCTUFUVJ6xmeh2AZXyWIgoYimlPGpLaWFLU67fH+dMmKaTZDLnnM6ZnO/79ZpXZs6cuc49T1fuuc597mPujoiIDG9vqXQDREQkeUr2IiIZoGQvIpIBSvYiIhmgZC8ikgFK9iIiGaBkLzsws6vN7NxKt6PSBnodzOxUM1uxs9tUjczsZDO7s9LtyDol+xQzs9Vm9pqZbTKz9WZ2i5ntW+l2FTIzN7MDK92OamZmXWb2f+H7/JKZ/djMJla6XXFx9+vc/ahKtyPrlOzT75PuPgaYCDwPXFrh9iTGAln9TH41fJ8PBsYDFxdbycxG7NRWybCR1S9W1XH3/wNuBKbml5nZODP7gZm9aGZPmdk388nSzC4zsxsL1l1kZsvDhHqEma01s7PCnuRqMzu5v22b2Wlm9oSZ/cXMfmZmk8LlvwpXeSjslX6hyGNHmNmF4Xb+ZGZfDX8N1IT3d5lZm5n9L/Aq8A4zmxRu5y/hdk8riLddaSX/XApurzazfzKzx8JfQ983s1EF9x9nZg+a2QYzu9vM/qrgvulmdr+ZvWJm1wO9j+v/pbFLzexlM/udmTWGC08ws5V9VjzTzG4aJB7u/hfgR8C0gud7mZndamabgQYzqzWzxWa2xsyeN7PLzWzXgm0tMLNnzWydmX258NdXGO+/wl+Jr5jZr83snQWPvcTMnjazjWa20sw+VHDft83shvAz94qZPWpmhxXcv2/4q+RFM/uzmf1nuHy7kpeZvdvM7grf38fN7MSC+44N37tXzOwZM/uHwV4zKZG765LSC7Aa+Gh4fTdgCfCDgvt/APwUGAtMAX4PNBes/3vgVOBDwEvA5PC+I4Ae4CKgFvgIsBl4V3j/1cC54fUjw8ceGq57KfCrgjY4cOAAz2Ee8BgwGdgD+EX4mJrw/i5gDVAP1AAjgV8C/02QbA8BXgQa+7at4Lms7fOarQL2BfYE/rfguRwKvAC8DxgBzA7XrwV2AZ4CvhG24fPA1sJt9Xlep4avYX79LwAvh9usBf4C1BWs/wDwuX5idQFfDq+/FegEril4vi8DHyTonI0C/h34WbitscDPgfPC9T8OPBe+nrsB1xS+R2G8vwCHh6/3dcCygracAuwV3ndmGGtUeN+3gf8Djg1fv/OAe8P7RgAPEfwiGR22c1bBa7UivD4aeBr423AbhxJ8vurD+58FPhRe3wM4tNLfw+FyqXgDdBngzQkS0SZgQ5hY1gHvCe8bAWwBphas/xWgq+D24eEX+ymgqWD5EWG80QXLbgD+Obx+NW8myHbggoL1xhAkwSnh7cGSfSfwlYLbH2XHZP8vBffvC2wDxhYsOw+4um/bCp5L32Q/r+D2scAfw+uXAf/ap32PE/yz+3D4+lrBfXczcLLvu/5vgC8VbKstvF4PrAdq+4nVRfCrZgPwDEECflvB8y38B28E/5jfWbDsA8CfwutXESb+8PaB7Jjsv9fn9fndAO/feuC94fVvA78ouG8q8FpBG17Mv69FXqt8sv8C8D997v8ucHZ4fQ3B53j3Sn//httFZZz0O97dxxP0Fr8K/NLM3k7QA8z3RvOeAvbJ33D33wBPEiSIG/rEXe/um/s8dlKR7U8q3Ia7bwL+XLidQUwi6MnlPV1kncJlk4C/uPsrfdpW6vb6xit8XvsDZ4YlnA1mtoHgn8uk8PKMhxmn4LEDKbZ+fltLgJPMzIAvATe4+5YBYn3N3ce7+z7ufrK7v9jP83kbQY99ZcFzuD1cDqW93s8VXH+V4B840Ftu6g5LUxuAcQSftf4eOyosye0LPOXuPQM8Rwjeg/f1eQ9OBt4e3v85gn9AT5nZL83sA4PEkxIp2VcJd9/m7j8m6PXOIvjpu5Xgy5O3H0HPEAAz+zuCfxLrgAV9Qu5hZqP7PHZdkU2vK9xG+Ji9CrcziGcJSjh5xUYTFSbMdcCeZja2T9vy29tMkOzy3s6OCrdR+LyeJuhtjy+47ObuHWE79wmTc+FjB1Js/XUA7n4v8DpBCe0kgnJKuQpfn5eA1wjKHvnnMM6DnbtQ2utdVFifbwFOBPYIOxkvE3QWBvM0sF9+X8wg6/2yz3swxt1PB3D337r7p4G9gZvYsZMiZVKyrxIW+DRBHbPb3bcRfBHazGysme0PnAFcG65/MHAuQQ32S8ACMzukT9hzzGyX8Et+HPDDIpteCvytmR1iZrXAd4Bfu/vq8P7ngXcM0PQbgK+b2T5mNp4gmfTL3Z8mKJ+cZ2ajwh2ozQSlDYAHgWPNbM/wF87fFwnzd2Y22cz2BM4Crg+XXwnMM7P3ha/nr8IdgXsB9xCUtr5mZjVm9lmCMthA9g7XH2lmJwB1wK0F9/8A+E+gx91jGZPv7m+Ez+NiM9sbIHxtjw5XuYHg/aozs92Abw0h/FiC1+BFoMbMvgXsXuJjf0Pwj+Z8MxsdvncfLLLezcDBZval8HUbaWZ/HbZ3FwvG5I9z963ARoLOjcRAyT79fm5mmwg++G3AbHd/NLxvPkFP90lgBUFivirsXV0LLHL3h9z9DwRJ75owYUPwc3w9QU/0OoI69+/6btzdlwP/TDBC5FngncAXC1b5NrAk/El+Yt/HEySmO4GHCXZS3kqQUAb6EjcR7HBeB/yEoJ57V3jfNQQ7AleHca8v8vil4X1Phpdzw+dyH3AaQQJ+maDXXQsc6+6vA58lqC+vJ6gt/3iANgL8GjiIoLfdBnze3f9ccP81BKNqovTqi2kBngDuNbONBDu93wXg7rcB/wHkwnXuCR8zUAkp7w7gNoId+08R7IwtVgbaQdj5+CTBPoI1wFqC17Dveq8ARxF8htYRfA4XEbwPEHRMVofPax5BZ0ViYNuXHCULzOwI4Fp3nzzYugls+xjgcnfff9CVy4u/mmBkyy8GWe9bwNEECftgdz8uXL4XwU7MjxDsvL0DOMLdZ4X3v5tgRNIMgh7wP7t70VKDBcMhXyAYUfKHyE+uDGZWRzA6qbaEeroMY+rZS6LMbNdw7HSNme0DnE3QW6+0vyH4RXMdcLSZTQiX/xfBr6W3EwzNnJ1/QLi/4i6CXw57E/wC+W8zq+9nG6cDv93Zid7MPhOWRPYg6DX/XIlelOwlaQacQ1AaeQDoZmh15PgbZDaLYKfzDe6+EvgjwciZEQSjQc5291fd/TGCUTV5xwGr3f377t7j7vcTlLc+X2Qbq4GvE4xV39m+QvCr448E5bLTK9AGSZnB9pzLMOTuXWw/YiPJbb0K/PXO2Fa4vSklrDYbuNPdXwpvLw2XdRB8J/obutg7bLBgWQ1FavIltiMR7v7xSm1b0kvJXjIlrKOfCIwws/yY8VqC+WgmEOw8nkywkxK2H7qYHzb4sZ3UXJHYaAetZIqZNRHU5Q8hGAefdwPwW4JEvw34MsG4+TuBNe4+Kxz7vwr4JrAsfNwhwCZ37945z0CkPKrZS9bMBr7v7mvc/bn8hWA45skERymPIxgSeA1BaWcLlDRsUCS11LMXGYCZLQLe7u6zB11ZJMXUsxcpYMH0u38VHmF7OMHRu2kYKioSiXbQimxvLEHpZhLBAVEXEkwjLVLVVMYREckAlXFERDJAyV5EJAN2as3+rW99q0+ZMmXQ9TZv3szo0aMHXa9UccdLImba4yURM+3xkoiZ9nhJxEx7vCRiVireypUrX3L3txW9c2eeFmvGjBleilwuV9J6pYo7XhIx0x4viZhpj5dEzLTHSyJm2uMlEbNS8YD7XKclFBHJLiV7EZEMULIXEckAJXsRkQxQshcRyQAle5GdoKOjg2nTptHY2Mi0adPo6OiodJMkYzQ3jkjCOjo6aG1tpb29nW3btjFixAiam5sBaGpqqnDrJCtK6tmb2TfM7FEzW2VmHWY2yswOMLNfm9kfzOx6M9sl6caKVKO2tjba29tpaGigpqaGhoYG2tvbaWtrq3TTJEMGTfZmtg/wNeAwd58GjCA4ecMi4GJ3P4jgZNLNSTZUpFp1d3cza9as7ZbNmjWL7m6d3Ep2nlJr9jXArmZWA+wGPAscCdwY3r8EOD7+5olUv7q6OlasWLHdshUrVlBXV1ehFkkWDZrs3f0ZYDGwhiDJvwysBDa4e0+42lpgn6QaKVLNWltbaW5uJpfL0dPTQy6Xo7m5mdbW1ko3TTJk0PnszWwP4EfAF4ANwA/D22e7+4HhOvsCt7r7e4o8fi4wF2DChAkzli1b1neVHWzatIkxY8YM7ZnsxHhJxEx7vCRipj1enDGXL1/Otddey5o1a9hvv/045ZRTaGxsTE37koyZ9nhJxKxUvIaGhpXufljRO/ubNCd/AU4A2gtu/w1wGfASUBMu+wBwx2CxNBFa9cZLImba4yURM+3xkoiZ9nhJxKzWidDWAO83s93MzIBG4DEgB3w+XGc2OnWbiEhqlVKz/zXBjtj7gUfCx1wBtABnmNkTwF5Ae4LtFBGRCEo6qMrdzwbO7rP4SeDw2FskIiKx03QJIiIZoGQvIpIBSvYiIhmgZC8ikgFK9iIiGaBkLyKSAUr2IiIZoGQvIpIBSvYiIhmgZC8ikgFK9iIiGaBkLyKSAUr2IiIZoGQvIpIBSvYiIhmgZC8ikgFK9iIiGaBkLyKSAUr2Q9TR0cG0adNobGxk2rRpdHR0VLpJIiKDKukctBLo6OigtbWV9vZ2tm3bxogRI2hubgagqampwq0TEemfevZD0NbWRnt7Ow0NDdTU1NDQ0EB7ezttbW2VbpqIyICU7Iegu7ubWbNmbbds1qxZdHd3V6hFIiKlUbIfgrq6OlasWLHdshUrVlBXV1ehFomIlEbJfghaW1tpbm4ml8vR09NDLpejubmZ1tbWSjdNRGRAg+6gNbN3AdcXLHoH8C3gB+HyKcBq4ER3Xx9/E9MjvxN2/vz5dHd3U1dXR1tbm3bOikjqDdqzd/fH3f0Qdz8EmAG8CvwEWAgsd/eDgOXh7WGvqamJVatWsXz5clatWpW6RK+hoSJSzFCHXjYCf3T3p8zs08AR4fIlQBfQEl/TZKg0NFRE+jPUmv0XgXxXcYK7PwsQ/t07zobJ0GloqIj0x9y9tBXNdgHWAfXu/ryZbXD38QX3r3f3PYo8bi4wF2DChAkzli1bNui2Nm3axJgxY0p8CoOLO14SMeOI19jYyB133EFNTU1vvJ6eHo4++miWL1+eijZWU7wkYqY9XhIx0x4viZiVitfQ0LDS3Q8reqe7l3QBPg3cWXD7cWBieH0i8PhgMWbMmOGlyOVyJa1XqrjjJREzjnj19fXe2dm5XbzOzk6vr6+PHLswZlzSHi+JmGmPl0TMtMdLImal4gH3eT/5dyhlnCbeLOEA/AyYHV6fDfx0CLEkARoaKiL9KWkHrZntBnwM+ErB4vOBG8ysGVgDnBB/82QokhgaamZFl3uJ5T8RSYeSkr27vwrs1WfZnwlG50iKNDU10dTURFdXF0cccUTkeIVJfcrCW1h9/icixxSRnU9H0IqIZECqkr0OCIpOr2F0ZtZ7aWho6L0uUs1SM5+9DgiKTq9hPFS6kuEoNT17HRAUnV5DEelPapK95oqPTq+hiPQnNcm+WuaKT3NNvFpeQxHZ+VJTs88fEJSvN+cPCEpTCSLtNfFqeA1FpDJSk+yrYa74wpp4fhx7e3s78+fPT0U7q+E1FJHKSE2yrwbVUBOP+6CquA00hFFH5YokJzXJPu0lEnizJt7Q0NC7TDXxodGwRpHKSM0O2moYNqiJxkSkWqWmZ18tJRJQTVxEqk9qevbVMmww7eegFREpJjXJXiUSEZHkpKaMoxKJiEhyUtOzh2yWSOI+IjfNR/iKSOWkpmefRXEPN62G4asiUhmp6tlnTdzDTath+KqIVEaqkn01lDTijBn3cNNqGL4q8Sh2cpW0nWBFJcV0SU0ZpxpKGnHHjPuIXB3hmx35I5HTehSySoop5O477TJjxgzvT319vXd2drq7ey6Xc3f3zs5Or6+v7/cxA4k7XhIxly5d6gcccIB3dnb6XXfd5Z2dnX7AAQf40qVLUxGvr/1bbo4lTlLx8u9JnNLexrjb5x5PG5P4/uUl8T7HHbNS8YD7vJ/8m5qefTWUNOKOGfdw06amJu6++26OOeYYtmzZQm1tLaeddtqw70l1dHTQ1tbW+xq2trYO++ecdioppk9qkn01lDSSiBnnLJUdHR3ccsst3Hbbbdv9dJ45c+awTX4qF6STSorpk5odtHEfQZvEEblpP8o3i6Nxsvicq0HavytZVFLP3szGA98DpgEOzAEeB64HpgCrgRPdfX25DUmipBFnvKRixqm7u5u1a9cybdq03va1tLQM65/OWXzO1SDt35UsKrWMcwlwu7t/3sx2AXYDzgKWu/v5ZrYQWAi0RGlM3CfeSOJEHmk+OcikSZNYsGABS5cu7S1pnHTSSUyaNKnSTUtMFp9ztUjzdyWLBi3jmNnuwIeBdgB3f93dNwCfBpaEqy0Bjk+qkVK6vmOt0zb2OglZfM4iQ2U+yKngzOwQ4ArgMeC9wErg68Az7j6+YL317r5HkcfPBeYCTJgwYcayZcsGbdSmTZsYM2bMEJ7Gzo2XRMw44jU2NtLS0kJHRwdr1qxhv/32o6mpiUWLFrF8+fLIbTz19s1c/fHRkePEGa/annPcn5u42wfp/GwnGS+JmJWK19DQsNLdDyt6Z39jMvMX4DCgB3hfePsS4F+BDX3WWz9YrIHG2bsH48Tr6+v9LW95i9fX10ceHx53vEJpHJeb5Nhm93SOs6+255yVcfbVFC+JmNU6zn4tsNbdfx3evpGgPv+8mU1092fNbCLwQgmx+lUNR9CmXX4ERP4550dADOeRKVl8ziLlGDTZu/tzZva0mb3L3R8HGglKOo8Bs4Hzw78/jdKQwiF0+R067e3tzJ8/v6zkHHe8apDFERBZfM4i5Sh1nP184Dozexg4BPgOQZL/mJn9AfhYeLtshUPo8hMnrV27NlVH0FaDLJ4TIIvPuRpoIrR0KWnopbs/SFC776sxroZMmjSJlpYWrrvuut6yy8knn1z2EDodwSdSOVkso6Zdao6ghTdn8uvv9lDoCD6RytGRzemTmrlx1q1bx9VXX71d7fWCCy7g1FNPLSuearmSFv2N+4/SmUm7JMqomvAumtQk+7q6OiZPnsyqVat6d6jmcrnUTDImUq7CpJ7W+efjFncZVWWh6FJTxlHZRWT4iPv7rLJQdKnp2avsEg/91JU0iPv7nNXRdXFKTbIHlV2i0k9dSZM4v88aXRddaso4Ep1+6spwpTJvdKnq2Us0+qkrw5XKvNGpZ19hcR5lmP+pWygLP3V1pGY26EjpaNSzr6C4a+xZnBRM+ylESqOefQXFXWNvamqira2N+fPnc/TRRzN//vxh/1NX+ylESqNkP0RxlgySqLFn7adu3BPoiQxXKuMMQdwlAw0niy7uCfREhiv17Icg7pKBhpPFI84J9ESGK/Xsh6C7u5sf/vCHHHPMMWzZsoXa2lrmzJlTdslAw8mii3sCPZHhSsl+CMaPH88VV1zBBRdcwNSpU3nsscdYsGAB48ePH/zB/dBRw9EkMYGeyHCkZD8EGzduZNy4cUyfPp1t27Yxffp0xo0bx8aNGyvdtMyKc7jpe8+5k5df27rD8ikLb9nu9rhdR/LQ2UeV3WaRSlCyH4Kenh4WL168Xclg8eLFzJkzp9JNy6w4S2Evv7Z1h+mHi/3i6pv8RapBqnbQxn0kZNzxamtrWb58+XbLli9fTm1tbWramEVZG26aVfquRJOann3cwxqTOLLyIx/5CNdddx2nn346559/PrfeeiuXXXYZRx1V3k96Hf0pUhp9V6JLTc8+7mGNSRxZ+cwzz3D88cdz1VVX8clPfpKrrrqK448/nmeeeSY1bRQZjvRdiS41Pfu4jyZN4ujU7u5uHnjgAUaOHNlby926dSujRo1KTRtFhiN9V6JLTc8+7hkbk5gBshraKDIc6bsSXWqSfdxHkyZxdGo1tFFkONJ3JbqSyjhmthp4BdgG9Lj7YWa2J3A9MAVYDZzo7uvLbUjcR5M2NTVx9913b3e062mnnRZpZ04SbYwzXlaZWdHlmjahdGl/DfVdiW4oNfsGd3+p4PZCYLm7n29mC8PbLVEaE+fRpB0dHdxyyy3cdttt2+29nzlzZuSEH+cRrzqCNrp8Qpqy8JYdxsnLwPIHku3fcnPR+/PHFKThQDJ9V6KJsoP208AR4fUlQBcRk32cCvfe5z8c7e3tzJ8/X70BkZAOJMuOUpO9A3eamQPfdfcrgAnu/iyAuz9rZnsXe6CZzQXmAkyYMIGurq5BN7Zp06aS1htId3c327Zto6urqzfetm3b6O7ujhw7rjZWU7y8uGOmLV7fx/f3OkbZTtaeczV8ttPexljiufugF2BS+Hdv4CHgw8CGPuusHyzOjBkzvBS5XK6k9QZSX1/vnZ2d28Xr7Oz0+vr6yLELY8Yl7fHc3fdvuXlYxyv2+GKvY5TtZPE5V8NnO+1tLDUecJ/3k39LGo3j7uvCvy8APwEOB543s4kA4d8Xov3biZf23ouIvGnQMo6ZjQbe4u6vhNePAv4F+BkwGzg//PvTJBs6VNp7L5WmWTQlTUqp2U8AfhIOzaoBlrr77Wb2W+AGM2sG1gAnJNfM8mjvvVSSdn5Kmgya7N39SeC9RZb/GWhMolEiIhKv1MyNI8ObShoilTWsk31HRwdtbW29NfvW1lbV7CtEJY3oquEfZr6NTy06ruj9+YO39E995xu2yV7zX8twUw3/MHvbeP6b0yykrY1ZlZqJ0OKm+a9FRN6Uqp79/PnzufLKK7ebuOzSSy8tK1Z3dzdr165l2rRpvWWclpaW1M1/rVKTiBR6z5L3FL9jyY6LHpn9SMlxU5Ps58+fz+WXX86iRYuYOnUqjz32GC0twVQ75ST8SZMmsWDBApYuXdpbxjnppJOYNGlS3E0vm0pNItJXsQQex/Dx1JRxrrzyShYtWsQZZ5zBqFGjOOOMM1i0aBFXXnll2TH7Ttva3zSulaJSk4jsLKnp2W/ZsoV58+Ztt2zevHmceeaZZcVbt24dRx55JI2Njbg7ZkZjYyOdnZ1xNDcWSZxqLe3zksvwNrZuIe9ZsnDHO5b0XQ+gstNRZ+27kppkX1tby+WXX84ZZ5zRu+zyyy+ntra2rHjjx48nl8uxePHi3rLQggULGD9+fFxNjix/qrWGhobeZVFPteaa210q6JXu81M/YiivMKln4fuSmmR/2mmn9dbop06dykUXXURLS8sOvf1Sbdy4kXHjxjF9+nS2bdvG9OnTGTduHBs3boyz2ZHkJ2vL1+zzk7WpjCMicUtNss/vhD3rrLN6R+PMmzev7NE4PT09LF68eLuJ0BYvXsycOXPibHYkaZ6srRoO4JHoVHbJjtQk+7jV1tayfv16Vq1a1fsz8qKLLiq7LJQ11XAAj0SX9rJLYadjoFMnqtMxuNQk+7iHXsZdFkqChl6KDEydjvikJtkXDr3s6urq3VF71llnlZXs4y4LJUHnyRWRnSU1yT7uoZcAM2fOJJfL0d3dzYEHHsjMmTOjNjNWSQy9lPSopnp4nIr2sm/fcV9PpWR1f1Rqkn3cQy+roUSSxNBLSY+018OTUGz4YpRhjUn8w8xqaSg1yT7uGns1lEg09FJkYFn8h5mU1CT7uGvs3d3dfOc739nhCNqoJZI4Jy5L89DLtEvip3hWyy4STVITl8UtNckegoR/6aWXxjLpz6677sovfvELTj/9dI499lhuvfVWLrvsMkaPHl12zCRKQzpPbnmS+CmuXqSUI6mJy+KWmonQ4rZ582bGjh3LCSecwKhRozjhhBMYO3YsmzdvLjumJi4TkWqVqp593C688MLtSiQXXnghc+fOLTueRs+IVL+slutSlezjrIebGZdccglPPPEEb7zxBk888QSXXHJJpGmO6+rqOOecc7jpppt623j88cdr9IxIFclquS41yT7uevjkyZN59NFHmTlzJt/4xje4+OKLufvuu9l3333LbmNDQwOLFi3a4SjfNB2VKyJSTMnJ3sxGAPcBz7j7cWZ2ALAM2BO4H/iSu79ebkPiHir5wgsvcPDBB3PPPfdw9913Y2YcfPDBPPXUU+U2kVwuR0tLC1ddddV2pzq86aabyo4pIgNL+0Fa1WIoPfuvA93A7uHtRcDF7r7MzC4HmoHLym1I3PXwLVu2sHDhQi688MLexHzmmWdGmvWyu7ubBx54gHPPPbf3H9LWrVs577zzyo4p5clq3TUJaU6mcR+klWUlJXszm0zwjWkDzrCg8H0kcFK4yhLg20RI9nEfTVpTU8OZZ57Jj370o96y0Oc+9zlqasqvXOmI1/TIYt01iX9wSqbZUWrm+3dgATA2vL0XsMHde8Lba4F9ojQk7qNJd999d15++WUeeOABpk6dysMPP9x7QpO0tFFkKLL4D07iM2iyN7PjgBfcfaWZHZFfXGTVomcQMLO5wFyACRMm0NXVVXQ7EydO5OSTT2bOnDmsWbOG/fbbj1NOOYWJEyf2+5iBbNiwgeOOO46FCxeydetWRo4cySc+8QluvvnmsuIl0cbCXwh95XK5stpYqNzn2d/jN23aVDRmqdtJe7wkYqY9XqnbiSpt8UopXY0eWf52+ntfyhVLPHcf8AKcR9BzXw08B7wKXAe8BNSE63wAuGOwWDNmzPBS5HK5ktYbSH19vXd2dm4Xr7Oz0+vr6yPHLowZl/1bbk5VvGKPL/acS91O2uMlETPt8YayneEcL4mYceeHUuMB93k/+XfQI2jd/Z/cfbK7TwG+CHS6+8lADvh8uNps4KfR/u3EK19yyeVy9PT09JZcWltbK900EZGdLso4+xZgmZmdCzwAtEdtjCYZk0pL88gUkSiGlOtJrRcAAA9wSURBVOzdvQvoCq8/CRweV0M0yZhUmkamyHCWmonQNMmYiEhyUjNdQhKTjPU3D06wH0NEktL3u2eLgr9RvnuFMfPxosRMoo1plpqeff6ApUJRD1jK74Xev+XmviOMRCRBhd+3XC4Xy3evWLwoMZNoY5qlpmevA5ZkONIOX0mL1CR7jZ6R4UY7fCVNUpPsQaNnRAajXwpSrlQlexm+NEtldPqlIFEo2ctOoUm8RCorNcl+oNMFDuc95HF47zl38vJrW3dY3jdxjtt1JA+dfVRJMauhJ66ShkjpUpPsCxO6fpoOzcuvbY2915z2nrhKGiJDk5px9iIikpzU9OxFpHKydjRpFqlnLyKZO5o0i5TsRUQyQGWcIdDEaiJSrdSzH4LCn7qFk6uJiKSdkr2ISAYo2YuIZICSvYhIBijZi4hkgJK9iEgGKNmLiGSAkr2ISAYo2YuIZMCgR9Ca2SjgV0BtuP6N7n62mR0ALAP2BO4HvuTuryfZ2OEiifnnRUQGUsp0CVuAI919k5mNBFaY2W3AGcDF7r7MzC4HmoHLEmzrsJHE/PMiIgMZtIzjgU3hzZHhxYEjgRvD5UuA4xNpoYiIRFZSzd7MRpjZg8ALwF3AH4EN7t4TrrIW2CeZJoqISFQlzXrp7tuAQ8xsPPAToK7YasUea2ZzgbkAEyZMoKurq6SGlbpeqeKOFzVm38du2rSpaLxStjGU88V2dY2uSBuTiFfKNuKQ9s9i3PH6e1+Ga7wkYqYyXuFMjqVcgLOBfwReAmrCZR8A7hjssTNmzPBS7N9yc0nrlSrueFFjFntsLpcrextxx0siZhJtjPOxOytm2uO5F39fhnO8JGJWKh5wn/eTfwct45jZ28IePWa2K/BRoBvIAZ8PV5sN/DTavx0REUlKKWWcicASMxtBUOO/wd1vNrPHgGVmdi7wANCeYDtLpmGN8Sk6Guj2HV/HSsXLKzypTP7cqZCek8okcX7XtD9nSZ9Bk727PwxML7L8SeDwJBoVhYY1xqPvawjBa1ZseSXiFconuGLvcxoUJuC42pj25yzpoyNoRUQyoOLnoFXZRUQkeRVP9iq7iIgkT2UcEZEMqHjPvhrEXWoaykFQEH0HpoiIkn0J4i41vdJ9vkpXIrJTqYwjIpIBSvYiIhmgMs4wkdTRqSIyPCjZDwNJHp0qIsODyjgiIhlQ8Z59NQxDrIY2ZllHRwdtbW10d3dTV1dHa2srTU1NlW6WSKpUPNlXwzDEamhjVnV0dNDa2kp7ezvbtm1jxIgRNDc3AyjhixRQGUeqWltbG+3t7TQ0NFBTU0NDQwPt7e20tbVVumkiqVLxnn3cqqXkUi2jZ5KYiz1O3d3dzJo1a7tls2bNoru7u0ItEkmnYZfsq6HkUk2jZ5KYiz1OdXV1rFixgoaGht5lK1asoK6u2GmSRbJLZRypaq2trTQ3N5PL5ejp6SGXy9Hc3Exra2ulmyaSKsOuZy/p1l9ZCMorDeV3ws6fP793NE5bW5t2zor0oZ697FSFZ7vP5XLb3S5XU1MTq1atYvny5axatUqJXqQIJXsRkQxQshcRyYBU1OzjHoaYxLDGahkqKdGlfbipSDkqnuzjHoaYxLDGahoqKdGlfbipSDlUxhERyYBBe/Zmti/wA+DtwBvAFe5+iZntCVwPTAFWAye6+/pyGxL3kLy+MeOOVxgz7nhxxYwjnogMD6X07HuAM929Dng/8HdmNhVYCCx394OA5eHtsiUxJC+peH1jxh0vTc9ZRIaHQZO9uz/r7veH118BuoF9gE/z5owzS4Djk2qkiIhEY0Pp9ZnZFOBXwDRgjbuPL7hvvbvvUeQxc4G5ABMmTJixbNmyQbezadMmxowZU3K7dna8JGKmPV4SMdMeL4mYaY+XRMy0x0siZqXiNTQ0rHT3w4reWfhTf6ALMAZYCXw2vL2hz/3rB4sxY8YML0UulytpvVLFHS+JmGmPl0TMtMdLImba4yURM+3xkohZqXjAfd5P/i1pNI6ZjQR+BFzn7j8OFz9vZhPD+ycCL5QSS0REdr5Bk70FwzvagW53v6jgrp8Bs8Prs4Gfxt88ERGJQykHVX0Q+BLwiJk9GC47CzgfuMHMmoE1wAnJNFFERKIaNNm7+wrA+rm7Md7miIhIEnQErYhIBgxp6GXkjZm9CDxVwqpvBV6KcdNxx0siZtrjJREz7fGSiJn2eEnETHu8JGJWKt7+7v62Ynfs1GRfKjO7z/sbK5qCeEnETHu8JGKmPV4SMdMeL4mYaY+XRMw0xlMZR0QkA5TsRUQyIK3J/oqUx0siZtrjJREz7fGSiJn2eEnETHu8JGKmLl4qa/YiIhKvtPbsRUQkRqlK9mb2cTN73MyeMLNI8+OH8a4ysxfMbFVM7dvXzHJm1m1mj5rZ12OIOcrMfmNmD4Uxz4mprSPM7AEzuzmGWKvN7BEze9DM7oupfePN7EYz+134en4gQqx3hW3LXzaa2d9HbN83wvdjlZl1mNmoiPG+HsZ6tNy2Ffs8m9meZnaXmf0h/LvDzLNDjHdC2MY3zGzIoz/6iflv4fv8sJn9xMzGDxSjhHj/GsZ60MzuNLNJUeIV3PcPZuZm9tZS4w3Qxm+b2TMFn8ljI8a7viDW6oLZDErX3wxpO/sCjAD+CLwD2AV4CJgaMeaHgUOBVTG1cSJwaHh9LPD7GNpowJjw+kjg18D7Y2jrGcBS4OYYYq0G3hrz+70E+HJ4fRdgfIyfo+cIxhuXG2Mf4E/AruHtG4BTI8SbBqwCdiM4av0XwEFlxNnh8wxcACwMry8EFkWMVwe8C+gCDoupjUcBNeH1RTG0cfeC618DLo8SL1y+L3AHwXFAQ/qs99PGbwP/UObnZcC8BVwIfGuocdPUsz8ceMLdn3T314FlBCdIKZu7/wr4SxyNC+P1dyKXKDHd3TeFN0eGl0g7UsxsMvAJ4HtR4iTFzHYn+EC3A7j76+6+IabwjcAf3b2Ug/cGUgPsamY1BEl6XYRYdcC97v6qu/cAvwQ+M9Qg/Xyeyz6JULF47t7t7o8PtW2DxLwzfN4A9wKTI8bbWHBzNEP4vgyQEy4GFgwlVgkxyzJQvHBiyhOBjqHGTVOy3wd4uuD2WiIm0iRZcCKX6QQ98aixRoQ/y14A7nL3qDH/neCD+0bUtoUcuNPMVlpwMpqo3gG8CHw/LDV9z8xGxxAX4IuU8UUo5O7PAIsJJvh7FnjZ3e+MEHIV8GEz28vMdgOOJehJxmGCuz8LQWcE2DumuEmZA9wWNYiZtZnZ08DJwLcixvoU8Iy7PxS1XX18NSw3XTWU8togPgQ87+5/GOoD05Tsi022lsqhQmY2hmB+/7/v08soi7tvc/dDCHo8h5vZtAhtOw54wd1XRm1XgQ+6+6HAMQTnIP5wxHg1BD9TL3P36cBmIp7DGMDMdgE+BfwwYpw9CHrMBwCTgNFmdkq58dy9m6B8cRdwO0GJsmfABw1DZtZK8LyvixrL3Vvdfd8w1lcjtGk3oJWI/zCKuAx4J3AIQYfhwpjiNlFmZyZNyX4t2/d2JhPtp3MirPiJXGIRljK6gI9HCPNB4FNmtpqgFHakmV0bsV3rwr8vAD8hKLlFsRZYW/AL5kaC5B/VMcD97v58xDgfBf7k7i+6+1bgx8DMKAHdvd3dD3X3DxP8RB9yz6wfVXESITObDRwHnOxh4TkmS4HPRXj8Own+qT8UfmcmA/eb2dujNMrdnw87cW8AVxL9O0NYUvwscH05j09Tsv8tcJCZHRD20L5IcIKU1AjrZcVO5BIl5tvyoxPMbFeCRPO7cuO5+z+5+2R3n0LwGna6e9m9UjMbbWZj89cJdrZFGt3k7s8BT5vZu8JFjcBjUWKGyu719LEGeL+Z7Ra+540E+2fKZmZ7h3/3I/jCxtFOqIKTCJnZx4EW4FPu/moM8Q4quPkpon1fHnH3vd19SvidWUswCOO5iG2cWHDzM0T8zoQ+CvzO3deW9ehy9hYndSGoZf6eYFROawzxOgh+Qm0leBObI8abRVBaehh4MLwcGzHmXwEPhDFXUcZe9gFiH0HE0TgE9fWHwsujcbwvYdxDgPvC530TsEfEeLsBfwbGxdS+cwiSyCrgGqA2Yrz/IfiH9hDQWGaMHT7PwF7AcoJfCsuBPSPG+0x4fQvwPHBHDG18gmB/XP47M5TRM8Xi/Sh8Xx4Gfg7sEyVen/tXM/TROMXaeA3wSNjGnwETo7YRuBqYV+5nUEfQiohkQJrKOCIikhAlexGRDFCyFxHJACV7EZEMULIXEckAJXupCma2LZzxb5WZ/TA88rEqmNndlW6DiJK9VIvX3P0Qd58GvA7MK7zTAqn8PLt7pKNvReKQyi+HyCD+BzjQzKZYMBf+fwP3A/ua2VFmdo+Z3R/+AhgDYGbHhnOqrzCz/7Bwnv9w3vGrzKzLzJ40s6/lN2JmN4WTvz1aOAGcmW0KJ+J6yMzuNbMJ4fIJFszX/lB4mZlfv+Cx/2hmvw0nyDonXDbazG4JH7PKzL6wE15DyRgle6kq4fwgxxAcnQjB3Os/8DcnVPsm8FEPJm67DzjDghOPfBc4xt1nAW/rE/bdwNEE85ecHc5/BDDH3WcAhwFfM7O9wuWjCaYsfi/wK+C0cPl/AL8Mlx9KcMRxYduPAg4Kt3MIMCOcVO7jwDp3f2/4y+X28l8hkeKU7KVa7BpOA30fwdw17eHyp9z93vD6+4GpwP+G684G9idI5k+6+5/C9frOS3OLu29x95cIJhKbEC7/mpk9RDAH+74EiRqCMlL+DGArgSnh9SMJZjvEg0mwXu6znaPCywMEv0TeHcZ8BPiomS0ysw8VeZxIZDWVboBIiV7zYBroXsEcZWwuXERwPoCmPutNHyT2loLr24AaMzuCYOKpD7j7q2bWBeRPTbjV35xnZBulf48MOM/dv7vDHWYzCOaGOs/M7nT3fykxpkhJ1LOX4eRe4INmdiAEc5Wb2cEEE5q9IzzhDEApNfFxwPow0b+b4FfDYJYDp4fbHmHBGbkK3QHMKdiPsI+Z7W3BOVRfdfdrCU6aEsd0zyLbUc9ehg13f9HMTgU6zKw2XPxNd/+9mf0/4HYzewn4TQnhbgfmmdnDwOME/0gG83XgCjNrJujxnw7cU9C+O82sDrgn/FWyCTgFOBD4NzN7g2Cmw9NL2JbIkGjWS8kEMxvj7pvC+en/C/iDu19c6XaJ7Cwq40hWnBbutH2UoESzQ91cZDhTz15EJAPUsxcRyQAlexGRDFCyFxHJACV7EZEMULIXEckAJXsRkQz4/wjdhou+UHRZAAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df[df['Age'].notnull()].boxplot('Age','Pregnancies')" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "por_embarazos" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "por_embarazos['Age']" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 36.5\n", "1 24.0\n", "2 43.0\n", "3 24.0\n", "4 25.0\n", " ... \n", "763 40.5\n", "764 25.0\n", "765 36.0\n", "766 24.0\n", "767 24.0\n", "Name: Age, Length: 768, dtype: float64" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "por_embarazos['Age'].transform('median')" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.62750.01
11856629026.60.35131.00
28183640023.30.67232.01
318966239428.10.16724.00
40137403516843.12.28833.01
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "0 0.627 50.0 1 \n", "1 0.351 31.0 0 \n", "2 0.672 32.0 1 \n", "3 0.167 24.0 0 \n", "4 2.288 33.0 1 " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Age'].fillna(por_embarazos['Age'].transform('median'), inplace=True)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Valores atípicos\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(os.path.join(\"diabetes.csv\"))" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAQRUlEQVR4nO3df2xdd3mA8efFTnEKCSWt27UJxUWqwJtRC7WqQqNpachYB6LZlkRYFYpWj0jt5P1gaGSzNIS2SK00LaBOyxZhpiAxtyXQH2pW2iozqSykMrspw8WwQmk7k9Ia1mRQIHXDuz98GprEqY8TX99+4+cjRffec8/pff96evS95x5HZiJJKs/rmj2AJOnUGHBJKpQBl6RCGXBJKpQBl6RCtS7kh5133nnZ0dGxkB8pScUbHR39UWa2H799QQPe0dHByMjIQn6kJBUvIp6aabtLKJJUKAMuSYUy4JJUKAMuSYUy4JJUKAOuRW1wcJCuri5aWlro6upicHCw2SNJtS3oZYTSa8ng4CD9/f0MDAywevVqhoeH6e3tBaCnp6fJ00mzi4W8nWx3d3d6HbheK7q6urj11ltZs2bN0W1DQ0P09fUxNjbWxMmkY0XEaGZ2H7+91hJKRPx5RDwWEWMRMRgRbRFxSUQ8HBGPR8TtEXHW/I8tNc74+DirV68+Ztvq1asZHx9v0kTS3Mwa8IhYCfwJ0J2ZXUAL8GHgFmB7Zl4KPA/0NnJQab51dnYyPDx8zLbh4WE6OzubNJE0N3W/xGwFlkZEK3A28AxwDbC7en8XsH7+x5Map7+/n97eXoaGhpiammJoaIje3l76+/ubPZpUy6xfYmbmDyLi74GngZ8DDwCjwMHMfKnabQJY2bAppQZ4+YvKvr4+xsfH6ezsZNu2bX6BqWLMGvCIeDNwHXAJcBD4InDtDLvO+G1oRGwBtgBcfPHFpzyo1Ag9PT0GW8Wqs4TyPuD7mTmZmVPAl4H3AudUSyoAq4ADMx2cmTszszszu9vbT7gboiTpFNUJ+NPAVRFxdkQEsBb4FjAEbKj22Qzc3ZgRJUkzmTXgmfkw019WPgJ8szpmJ/AJ4GMR8V3gXGCggXNKko5T65eYmflJ4JPHbX4CuHLeJ5Ik1eK9UCSpUAZckgplwCWpUAZckgplwCWpUAZckgplwCWpUAZckgplwCWpUAZckgplwCWpUAZckgplwCWpUAZckgplwCWpUAZckgplwCWpUAZckgplwLWoDQ4O0tXVRUtLC11dXQwODjZ7JKm2Wn8TUzoTDQ4O0t/fz8DAAKtXr2Z4eJje3l4Aenp6mjydNLvIzAX7sO7u7hwZGVmwz5NeTVdXF7feeitr1qw5um1oaIi+vj7GxsaaOJl0rIgYzczuE7YbcC1WLS0t/OIXv2DJkiVHt01NTdHW1saRI0eaOJl0rJMF3DVwLVqdnZ0MDw8fs214eJjOzs4mTSTNjQHXotXf309vby9DQ0NMTU0xNDREb28v/f39zR5NqmXWLzEj4u3A7a/Y9Dbgb4DPV9s7gCeBTZn5/PyPKDVGT08PX/va17j22ms5fPgwr3/96/noRz/qF5gqxqxn4Jn5ncy8PDMvB64AfgbcCWwF9mbmpcDe6rVUjMHBQfbs2cN9993Hiy++yH333ceePXu8lFDFmOsSylrge5n5FHAdsKvavgtYP5+DSY22bds2BgYGWLNmDUuWLGHNmjUMDAywbdu2Zo8m1TKnq1Ai4nPAI5n5jxFxMDPPecV7z2fmm2c4ZguwBeDiiy++4qmnnpqHsaXT51UoKsVpX4USEWcBHwK+OJcPzsydmdmdmd3t7e1zOVRqqM7OTjZt2kRbWxsRQVtbG5s2bfIqFBVjLkso1zJ99v1s9frZiLgQoHp8br6Hkxpp5cqV3HXXXdxwww0cPHiQG264gbvuuouVK1c2ezSplrkEvAd45bc79wCbq+ebgbvnayhpIezbt4/rr7+ehx56iBUrVvDQQw9x/fXXs2/fvmaPJtVSK+ARcTawDvjyKzbfDKyLiMer926e//Gkxjl8+DBr1649ZtvatWs5fPhwkyaS5qbWzawy82fAucdt+zHTV6VIRWptbeXjH/84u3fvPnozqw0bNtDa6j3eVAZ/ialFa/ny5Rw6dIj9+/czNTXF/v37OXToEMuXL2/2aFIt3sxKi1ZLSwsXXXQRExMTR7etWrWKAwcOeBmhXlO8mZV0nKVLlzIxMcGNN97IwYMHufHGG5mYmGDp0qXNHk2qxYBr0XrhhRdYtmwZGzdu5Oyzz2bjxo0sW7aMF154odmjSbUYcC1q27dvp6+vj7a2Nvr6+ti+fXuzR5JqM+BatCKC0dFRxsbGOHLkCGNjY4yOjhIRzR5NqsWAa9Fat24dO3bs4KabbuLQoUPcdNNN7Nixg3Xr1jV7NKkWr0LRovb+97+fBx98kMwkIli3bh33339/s8eSjnGyq1D8xYLOSKeyDJKZPPDAA3M6diFPgKTjuYSiM1JmzunfWz9x75yPMd5qNgMuSYUy4JJUKAMuSYUy4JJUKAMuSYUy4JJUKAMuSYUy4JJUKAMuSYUy4JJUKAMuSYUy4JJUKAMuSYUy4JJUqFoBj4hzImJ3RHw7IsYj4j0RsSIiHoyIx6vHNzd6WEnSr9Q9A/8M8JXMfAdwGTAObAX2ZualwN7qtSRpgcwa8IhYDvwmMACQmS9m5kHgOmBXtdsuYH2jhpQknajOGfjbgEngXyNif0R8NiLeAFyQmc8AVI/nz3RwRGyJiJGIGJmcnJy3wSVpsasT8Fbg3cCOzHwX8AJzWC7JzJ2Z2Z2Z3e3t7ac4piTpeHUCPgFMZObD1evdTAf92Yi4EKB6fK4xI0qSZjJrwDPzh8D/RMTbq01rgW8B9wCbq22bgbsbMqEkaUatNffrA74QEWcBTwB/yHT874iIXuBpYGNjRpQkzaRWwDPzUaB7hrfWzu84kqS6/CWmJBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoQy4JBXKgEtSoVrr7BQRTwI/AY4AL2Vmd0SsAG4HOoAngU2Z+XxjxpQkHW8uZ+BrMvPyzOyuXm8F9mbmpcDe6rUkaYGczhLKdcCu6vkuYP3pjyNJqqtuwBN4ICJGI2JLte2CzHwGoHo8f6YDI2JLRIxExMjk5OTpTyxJAmqugQNXZ+aBiDgfeDAivl33AzJzJ7AToLu7O09hRknSDGqdgWfmgerxOeBO4Erg2Yi4EKB6fK5RQ0qSTjRrwCPiDRGx7OXnwG8DY8A9wOZqt83A3Y0aUpJ0ojpLKBcAd0bEy/v/W2Z+JSL+E7gjInqBp4GNjRtTknS8WQOemU8Al82w/cfA2kYMJUmanb/ElKRCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKpQBl6RCGXBJKlRr3R0jogUYAX6QmR+MiEuA24AVwCPARzLzxcaMqcXssk89wKGfTzX8czq27mnof/9NS5fwjU/+dkM/Q4tL7YADfwqMA8ur17cA2zPztoj4Z6AX2DHP80kc+vkUT978gWaPcdoa/T8ILT61llAiYhXwAeCz1esArgF2V7vsAtY3YkBJ0szqroF/GvhL4JfV63OBg5n5UvV6Alg504ERsSUiRiJiZHJy8rSGlST9yqwBj4gPAs9l5ugrN8+wa850fGbuzMzuzOxub28/xTElScerswZ+NfChiPhdoI3pNfBPA+dERGt1Fr4KONC4MSVJx5v1DDwz/yozV2VmB/Bh4D8y83pgCNhQ7bYZuLthU0qSTnA614F/AvhYRHyX6TXxgfkZSZJUx1wuIyQzvwp8tXr+BHDl/I8kSarDX2JKUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVyoBLUqEMuCQVataAR0RbRHw9Ir4REY9FxKeq7ZdExMMR8XhE3B4RZzV+XEnSy+qcgR8GrsnMy4DLgd+JiKuAW4DtmXkp8DzQ27gxJUnHmzXgOe2n1csl1b8ErgF2V9t3AesbMqEkaUa11sAjoiUiHgWeAx4EvgcczMyXql0mgJUnOXZLRIxExMjk5OR8zCxJombAM/NIZl4OrAKuBDpn2u0kx+7MzO7M7G5vbz/1SSVJx5jTVSiZeRD4KnAVcE5EtFZvrQIOzO9okqRXU+cqlPaIOKd6vhR4HzAODAEbqt02A3c3akhJ0olaZ9+FC4FdEdHCdPDvyMx7I+JbwG0R8XfAfmCggXNqEVvWuZV37tra7DFO27JOgA80ewydQWYNeGb+F/CuGbY/wfR6uNRQPxm/mSdvLj98HVv3NHsEnWH8JaYkFcqAS1KhDLgkFcqAS1KhDLgkFcqAS1KhDLgkFcqAS1KhDLgkFcqAS1KhDLgkFcqAS1KhDLgkFcqAS1Kh6twPXGq6M+FWrG9auqTZI+gMY8D1mrcQ9wLv2LrnjLjnuBYXl1AkqVAGXJIKZcAlqVAGXJIKZcAlqVAGXJIKZcAlqVAGXJIKNesPeSLiLcDngV8DfgnszMzPRMQK4HagA3gS2JSZzzduVKm+iJj7MbfM/XMyc+4HSfOkzhn4S8BfZGYncBXwxxHx68BWYG9mXgrsrV5LrwmZuSD/pGaaNeCZ+UxmPlI9/wkwDqwErgN2VbvtAtY3akhJ0onmtAYeER3Au4CHgQsy8xmYjjxw/kmO2RIRIxExMjk5eXrTSpKOqh3wiHgj8CXgzzLz/+oel5k7M7M7M7vb29tPZUZJ0gxqBTwiljAd7y9k5perzc9GxIXV+xcCzzVmREnSTGYNeEx/nT8AjGfmP7zirXuAzdXzzcDd8z+eJOlk6twP/GrgI8A3I+LRattfAzcDd0REL/A0sLExI0qSZjJrwDNzGDjZRbVr53ccSVJd/hJTkgoVC/ljhIiYBJ5asA+U6jsP+FGzh5BO4q2ZecJlfAsacOm1KiJGMrO72XNIc+ESiiQVyoBLUqEMuDRtZ7MHkObKNXBJKpRn4JJUKAMuSYUy4Fo0IuL3IiIj4h3NnkWaDwZci0kPMAx8uNmDSPPBgGtRqO5nfzXQSxXwiHhdRPxTRDwWEfdGxL9HxIbqvSsiYl9EjEbE/S/fOll6LTHgWizWA1/JzP8G/jci3g38PtN/lPudwB8B74Gj97+/FdiQmVcAnwO2NWNo6dXUuZ2sdCboAT5dPb+ter0E+GJm/hL4YUQMVe+/HegCHqz+un0L8MzCjivNzoDrjBcR5wLXAF0RkUwHOYE7T3YI8FhmvmeBRpROiUsoWgw2AJ/PzLdmZkdmvgX4PtN3H/yDai38AuC3qv2/A7RHxNEllYj4jWYMLr0aA67FoIcTz7a/BFwETABjwL8ADwOHMvNFpqN/S0R8A3gUeO/CjSvV40/ptahFxBsz86fVMsvXgasz84fNnkuqwzVwLXb3RsQ5wFnA3xpvlcQzcEkqlGvgklQoAy5JhTLgklQoAy5JhTLgklSo/wem4dYgJy8DxAAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df['Age'].plot.box()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
1235132800026.80.186690
3634146780038.50.520671
453211900019.60.832720
459913474336025.90.460810
4898194800026.10.551670
537057600021.70.735670
66641458218032.50.235701
674891820035.60.587680
684513682000.00.640690
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "123 5 132 80 0 0 26.8 \n", "363 4 146 78 0 0 38.5 \n", "453 2 119 0 0 0 19.6 \n", "459 9 134 74 33 60 25.9 \n", "489 8 194 80 0 0 26.1 \n", "537 0 57 60 0 0 21.7 \n", "666 4 145 82 18 0 32.5 \n", "674 8 91 82 0 0 35.6 \n", "684 5 136 82 0 0 0.0 \n", "\n", " DiabetesPedigreeFunction Age Outcome \n", "123 0.186 69 0 \n", "363 0.520 67 1 \n", "453 0.832 72 0 \n", "459 0.460 81 0 \n", "489 0.551 67 0 \n", "537 0.735 67 0 \n", "666 0.235 70 1 \n", "674 0.587 68 0 \n", "684 0.640 69 0 " ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Identificación basada en percentiles (también existe la basada en la desviación estándar)\n", "q3 = df['Age'].quantile(.75)\n", "q1 = df['Age'].quantile(.25)\n", "\n", "IQR = q3 - q1\n", "\n", "df.loc[(df['Age'] > q3 + 1.5 * IQR) | (df['Age'] < q1 - 1.5 * IQR)]" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(759, 9)" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df = df.loc[(df['Age'] <= q3 + 1.5 * IQR) & (df['Age'] >= q1 - 1.5 * IQR)]\n", "df.loc[(df['Age'] <= q3 + 1.5 * IQR) & (df['Age'] >= q1 - 1.5 * IQR)].shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Binning\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](https://www.saedsayad.com/images/Binning_1.png)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(os.path.join(\"diabetes.csv\"))" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
count768.000000768.000000768.000000768.000000768.000000768.000000768.000000768.000000768.000000
mean3.845052120.89453169.10546920.53645879.79947931.9925780.47187633.2408850.348958
std3.36957831.97261819.35580715.952218115.2440027.8841600.33132911.7602320.476951
min0.0000000.0000000.0000000.0000000.0000000.0000000.07800021.0000000.000000
25%1.00000099.00000062.0000000.0000000.00000027.3000000.24375024.0000000.000000
50%3.000000117.00000072.00000023.00000030.50000032.0000000.37250029.0000000.000000
75%6.000000140.25000080.00000032.000000127.25000036.6000000.62625041.0000001.000000
max17.000000199.000000122.00000099.000000846.00000067.1000002.42000081.0000001.000000
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin \\\n", "count 768.000000 768.000000 768.000000 768.000000 768.000000 \n", "mean 3.845052 120.894531 69.105469 20.536458 79.799479 \n", "std 3.369578 31.972618 19.355807 15.952218 115.244002 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 1.000000 99.000000 62.000000 0.000000 0.000000 \n", "50% 3.000000 117.000000 72.000000 23.000000 30.500000 \n", "75% 6.000000 140.250000 80.000000 32.000000 127.250000 \n", "max 17.000000 199.000000 122.000000 99.000000 846.000000 \n", "\n", " BMI DiabetesPedigreeFunction Age Outcome \n", "count 768.000000 768.000000 768.000000 768.000000 \n", "mean 31.992578 0.471876 33.240885 0.348958 \n", "std 7.884160 0.331329 11.760232 0.476951 \n", "min 0.000000 0.078000 21.000000 0.000000 \n", "25% 27.300000 0.243750 24.000000 0.000000 \n", "50% 32.000000 0.372500 29.000000 0.000000 \n", "75% 36.600000 0.626250 41.000000 1.000000 \n", "max 67.100000 2.420000 81.000000 1.000000 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcomeYoungAdult
061487235033.60.6275010
11856629026.60.3513101
28183640023.30.6723211
318966239428.10.1672101
40137403516843.12.2883311
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome YoungAdult \n", "0 0.627 50 1 0 \n", "1 0.351 31 0 1 \n", "2 0.672 32 1 1 \n", "3 0.167 21 0 1 \n", "4 2.288 33 1 1 " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['YoungAdult'] = df['Age'].map(lambda age: 1 if age <= 35 else 0 ) # age <= 35 ? 1 : 0\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(270, 10)" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['YoungAdult'] == 0].shape" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(498, 10)" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['YoungAdult'] == 1].shape" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 low\n", "1 low\n", "2 low\n", "3 low\n", "4 very_low\n", " ... \n", "763 high\n", "764 low\n", "765 low\n", "766 very_low\n", "767 low\n", "Name: BloodPressure, Length: 768, dtype: category\n", "Categories (4, object): [very_low < low < high < very_high]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df['BloodPressure_Bin'] = pd.qcut(df['BloodPressure'], 4, labels=['very_low','low','high','very_high'])\n", "pd.qcut(df['BloodPressure'], 4, labels=['very_low','low','high','very_high'])" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcomeYoungAdultAgeCategogy
061487235033.60.6275010middle
11856629026.60.3513101young
28183640023.30.6723211young
318966239428.10.1672101young
40137403516843.12.2883311young
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome YoungAdult AgeCategogy \n", "0 0.627 50 1 0 middle \n", "1 0.351 31 0 1 young \n", "2 0.672 32 1 1 young \n", "3 0.167 21 0 1 young \n", "4 2.288 33 1 1 young " ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['AgeCategogy'] = pd.cut(df['Age'],bins=[0, 35, 55, 120], labels=['young', 'middle', 'old'])\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Transformación logarítmica\n", "\n", "\n", "\n", "Recuerde que log(0) = infinito" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAD4CAYAAAD2FnFTAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3de3icZZ3/8fd3cm5OPaXHtLSlSWlB5BCKrlB1VQR3l8oKUtQFXBRXwMNPcReUFcR1xSOeEOkuKKBykEWtKyvK4gGVhaYUSg9pm7Zpmx7Tpk2bpskkk+/vj5nUNJ0kkzTPPJPk87quuWbmOcx8O9d0Pnnu+7nvx9wdERGRniJhFyAiIplJASEiIkkpIEREJCkFhIiIJKWAEBGRpLLDLmCoTJw40WfNmhV2GSIiw8qKFSv2uXtZsnUjJiBmzZpFdXV12GWIiAwrZra1t3VqYhIRkaQUECIikpQCQkREklJAiIhIUgoIERFJSgEhIiJJKSBERCQpBYQEYltrK1/dto2f79uHppQXGZ5GzEA5yRzrjhzhwpUr2d/RAcD1U6fyvcpKzCzkykRkIHQEIUPK3Xl/TQ1mxqtVVdw8YwZLd+3iiYaGsEsTkQFSQMiQevbgQV44fJgvzp7NGUVF3DVnDq8tLOSTmzbRGouFXZ6IDIACQobUt+vrmZiTw/smTwYgy4yvnHoq29vaeGTv3pCrE5GBUEDIkDnQ3s5/79/P+6dMIT8r69jyt44bxxmFhXyjvl4d1iLDiAJChsyvGhuJAZdNnHjccjPjxmnTWHXkCK80N4dTnIgMmAJChsyy/fuZlJPDwpKSE9ZdXlZGtpmamUSGEQWEDAl355kDB7h4/HiykpzOOjE3l7eNG8eje/fSqWYmkWFBASFDYsPRo+xrb+fC0tJet7lq0iS2tbXx4qFDaaxMRAYr0IAws4vNbL2Z1ZrZLUnWLzKzl8ysw8wuT7K+xMx2mNl3gqxTTt4fm5oAuKCPgHjHhAlEgKcaG9NUlYicjMACwsyygHuAS4AFwFVmtqDHZtuAa4Ef9/Iynwd+H1SNMnT+2NTEhOxs5o0Z0+s2E3JyOL+khP9RQIgMC0EeQSwEat19s7tHgUeBxd03cPc6d18FdPbc2czOBSYDvw6wRhkiyw8d4vySkn6n03jH+PFUHz7Mnmg0TZWJyGAFGRDTge3dntcnlvXLzCLA14BP9bPd9WZWbWbVDZrKITStsRg1LS2cXVTU77aXTJgAwNM6ihDJeEEGRLI/JVM9feUG4Cl3397XRu6+1N2r3L2qrKxswAXK0Fjb0kIMeG0KAXF2URGTc3LUzCQyDAQ5m2s9MKPb83JgZ4r7vh640MxuAIqAXDNrdvcTOrolfC8nBr+lEhARM946bhzPHDiAu2uGV5EMFuQRxHKgwsxmm1kusARYlsqO7v5ed5/p7rOAm4GHFA6Z65XmZgojEU4tKEhp+zePG8ee9nZqWloCrkxETkZgAeHuHcBNwNPAOuBxd19jZnea2aUAZnaemdUDVwD3mdmaoOqR4LzS3MxrioqSDpBL5k1jxwLwu4MHgyxLRE5SoBcMcvengKd6LPtst8fLiTc99fUaPwB+EEB5MgTcnVeOHOHKAfQBzcnPpzwvj98dPMiHp6d03oKIhEAjqeWkNLS3c7CjgwWFhSnvY2a8aexYfnfwoGZ3FclgCgg5KesT/QiVKfY/dHnT2LHsVT+ESEZTQMhJ2XD0KACVfYygTkb9ECKZTwEhJ2VDSwu5ZpySnz+g/br3Q4hIZlJAyEnZcPQocwsKUj6DqYuZsai0lOeamtQPIZKhFBByUja0tAy4eanLhaWl7IpG2ZRophKRzKKAkEGLuVN79OiAO6i7LEr0Q/whMVW4iGQWBYQM2rbWVqLugz6CmD9mDBNzcnhOASGSkRQQMmibW1sBOHWAHdRdzIwLS0v5gzqqRTKSAkIGrS4RELMGGRAAi0pL2dzaSn3itUQkcyggZNDqWlvJAsrz8gb9Ghcm+iHUzCSSeRQQMmh1ra2U5+WRHRn81+i1hYUUZ2UpIEQykAJCBq2utfWkmpcAsiMR3qB+CJGMpICQQRuKgIB4P8Salhb26TrVIhlFASGDEu3sZEdb25AExIWlpQD8Uc1MIhlFASGDsr2tDefkzmDqcl5JCXlm6ocQyTAKCBmUrlNcBzpJXzJ5kQivKynRiGqRDKOAkEEZijEQ3S0aO5aXDh/mcEfHkLyeiJy8QAPCzC42s/VmVmtmtyRZv8jMXjKzDjO7vNvys8zseTNbY2arzOzKIOuUgatrbSXCyY2B6G5RaSmdwJ8PHRqS1xORkxdYQJhZFnAPcAmwALjKzBb02GwbcC3w4x7LW4Cr3f104GLgG2Y2NqhaZeC6xkDknMQYiO5eV1JCFuh0V5EMkh3gay8Eat19M4CZPQosBtZ2beDudYl1nd13dPcN3R7vNLO9QBmgX48Msb21lZlD1LwEUJSdzbnFxeqoFskgQTYxTQe2d3ten1g2IGa2EMgFNiVZd72ZVZtZdUNDw6ALlYGrb2tjem7ukL7morFjeeHQIVpjsSF9XREZnCADItklxgZ06TAzmwo8DLzf3Tt7rnf3pe5e5e5VZWVlgyxTBsrd2RGNMn2I+h+6LCotJerOi4cPD+nrisjgBBkQ9cCMbs/LgZ2p7mxmJcAvgdvc/f+GuDY5CQc6Omjt7ByyDuouF5SWYqgfQiRTBBkQy4EKM5ttZrnAEmBZKjsmtv8p8JC7/yTAGmUQdrS1AQz5EcS4nBzOKCxUP4RIhggsINy9A7gJeBpYBzzu7mvM7E4zuxTAzM4zs3rgCuA+M1uT2P3dwCLgWjN7OXE7K6haZWCCCgiINzP9qamJjs4TWhRFJM2CPIsJd38KeKrHss92e7yceNNTz/1+CPwwyNpk8Oq7AmKIO6kh3lF9z86drGxu5rySkiF/fRFJnUZSy4DtSMy6Oi2AI4iuifs07YZI+BQQMmA72tqYlJND7hANkutual4ecwsKeE4d1SKhU0DIgO1oawuk/6HLotJSnmtqotMHdFa0iAwxBYQMWOABMXYsjR0drD1yJLD3EJH+KSBkwHZEo4F0UHdZpH4IkYyggJABaY3F2NfePuSD5LqblZ9PeV6eBsyJhEwBIQOyM3EGU5BNTGbGhYl+CFc/hEhoFBAyIEEOkutuUWkpO6NRNicuTCQi6aeAkAFJW0CMjV/+4/dqZhIJjQJCBqRrkFyQfRAA88eMYUpuLr9ubAz0fUSkdwoIGZCdbW2MiUQoycoK9H3MjEvGj+fpAwc0L5NISBQQMiC7o1Gm5OZiluxyH0PrbyZM4GBHB8/rOtUioVBAyIB0BUQ6vG3cOLLN+OX+/Wl5PxE5ngJCBmRPGgOiJDubC0tL+aX6IURCoYCQAUnnEQTEm5lWHznCNp3uKpJ2CghJWbSzk/0dHUxOZ0CMHw+gZiaRECggJGV7E6e4pvMIYt6YMZyan8/P9+1L23uKSFygAWFmF5vZejOrNbNbkqxfZGYvmVmHmV3eY901ZrYxcbsmyDolNbtDCAgz411lZfzvwYM0tren7X1FJMCAMLMs4B7gEmABcJWZLeix2TbgWuDHPfYdD9wOnA8sBG43s3FB1Sqp2ZP4gU5nQABcXlZGh7uOIkTSLMgjiIVArbtvdvco8CiwuPsG7l7n7quAniOh3g78xt0b3f0A8Bvg4gBrlRR0HUGksw8CoKq4mFn5+fykoSGt7ysy2gUZENOB7d2e1yeWDdm+Zna9mVWbWXWDfjwCdywgcnLS+r5mxuVlZTxz4AAH1MwkkjZBBkSyobapzt2c0r7uvtTdq9y9qqysbEDFycDtjkYZm51NfsDTbCRzeVkZ7WpmEkmrIAOiHpjR7Xk5sDMN+0pA0jlIrqeFxcXMyc/n4T17Qnl/kdEoyIBYDlSY2WwzywWWAMtS3Pdp4CIzG5fonL4osUxCtDsaTXvzUhcz4+opU/jtwYMaNCeSJoEFhLt3ADcR/2FfBzzu7mvM7E4zuxTAzM4zs3rgCuA+M1uT2LcR+DzxkFkO3JlYJiFK9yjqnq6ePBkHHUWIpEl2kC/u7k8BT/VY9tluj5cTbz5Ktu8DwANB1icDE3ZAzC4o4I2lpTy4ezefnjkzLTPKioxmGkktKWmJxTgci4UaEADXTJnCxqNH+bOmABcJnAJCUrInpDEQPV1RVkZJVhbf3bEj1DpERgMFhKQkjGk2kinKzubaKVP4SUPDsdASkWAoICQlmRIQADdMn067O/+5a1fYpYiMaAoIScmeDAqIeWPG8NZx4/jezp26XrVIgBQQkpLd0SgGlIU0DqKnG6dNo76tjWW6ToRIYBQQkpLd0SgTc3LIjmTGV+ZvJ0xgdn4+X962DfdUZ3ARkYHIjP/tkvHCHgPRU3YkwqdmzOCFw4f5Q1NT2OWIjEgKCEnJnvb2jAoIgGunTGFSTg53bdsWdikiI5ICQlKyOxoNfQxETwVZWXysvJxfNTbySnNz2OWIjDgKCOmXu2dcE1OXG6ZNozgrS0cRIgFQQEi/DsVitHZ2ZmRAjM3J4YZp03hs717WHjkSdjkiI0pKAWFm/2Vmf2NmCpRRKJPGQCTzqZkzKcrK4va6urBLERlRUv3Bvxd4D7DRzO4ys9MCrEkyTCaNok5mQk4OHy8v54mGBl4+fDjsckRGjJQCwt2fcff3AucAdcBvzOzPZvZ+M8uMkVMSmLCuRT0QnygvZ2x2Np/VUYTIkEm5ycjMJgDXAh8AVgLfJB4YvwmkMskYmX4EAfG+iE/NmMEv9u/nBU0FLjIkUu2DeBJ4DhgD/J27X+ruj7n7R4CiIAuU8O2JRsk2Y3wGH0EAfHT6dCbm5HDbli1hlyIyIqR6BPGf7r7A3b/o7rsAzCwPwN2rAqtOMkLXtagjGX4Ft6LsbD4zcybPHDjA0426Qq3IyUo1IP4tybLn+9vJzC42s/VmVmtmtyRZn2dmjyXWv2BmsxLLc8zsQTN71czWmdmtKdYpAcjEQXK9uWH6dE7Nz+fmTZuIaY4mkZPSZ0CY2RQzOxcoMLOzzeycxO1NxJub+to3C7gHuARYAFxlZgt6bHYdcMDd5wJ3A19KLL8CyHP31wDnAh/qCg9Jv0wdJJdMbiTCXXPmsPrIEb6v60WInJTsfta/nXjHdDnw9W7LDwOf7mffhUCtu28GMLNHgcXA2m7bLAbuSDx+AviOxa9E70ChmWUDBUAUUM9jSPZEo5xVNHy6mt5VVsYbSkr417o6lkyaRFF2f19zEUmmzyMId3/Q3d8MXOvub+52u9Tdn+zntacD27s9r08sS7qNu3cATcAE4mFxBNgFbAO+6u4nNCqb2fVmVm1m1Q0NDf2UI4PR6Z6RE/X1xcz42ty57I5G+fL27f3vICJJ9fmnlZm9z91/CMwys0/0XO/uX0+y27Hdkyzr2Sjc2zYLgRgwDRgHPGdmz3QdjXR7/6XAUoCqqio1OAegsb2dDvdh0wfR5fySEpZMmsRXt2/n+qlTKc/PD7skkWGnv07qwsR9EVCc5NaXemBGt+flwM7etkk0J5UCjcRHbf/K3dvdfS/wJ0BnS4VgOIyB6M0XZ8/GgU9u2hR2KSLDUp9HEO5+X+L+c4N47eVAhZnNBnYAS4j/8He3DLiG+BlRlwPPurub2Tbgr83sh8Q7w18HfGMQNchJ2tPeDgzPgJhVUMCnZ87ks3V1fLCxkbeOHx92SSLDSqoD5b5sZiWJ00//18z2mdn7+ton0adwE/A0sA543N3XmNmdZnZpYrP7gQlmVgt8Aug6FfYe4kctq4kHzffdfdWA/3Vy0obzEQTAp2bMYG5BATdu3EhbZ2fY5YgMK6me3nGRu/+zmV1GvFnoCuC3wA/72sndnwKe6rHss90etyZeq+d+zcmWS/odm4dpmAZEflYW3547l0tefZWvb9/OraecEnZJIsNGqgPluuZYeAfwSLIzimRk2h2Nkh+JUJKVFXYpg3bxhAn8/cSJfH7rVra2toZdjsiwkWpA/MLMaoh3FP+vmZUB+p82CnQNkrMMn2ajP3fPnYsBN27YgGuEtUhKUp3u+xbg9UCVu7cTH6OwOMjCJDPsGUajqPsyMz+fL8yezS8bG/nRnj1hlyMyLAxkiOl84uMhuu/z0BDXIxlmdzTKqQUFYZcxJD5SXs7jDQ18tLaWt44bx5S8vLBLEsloqZ7F9DDwVeAC4LzETeMSRoHhNA9Tf7LMeGDePFpiMW7YuFFNTSL9SPUIogpY4PofNaq0d3ayb5hNs9Gf0woL+dzs2dyyeTOPNzRw5aRJYZckkrFS7aReDUwJshDJPA3t7TjDdwxEbz5ZXs7C4mI+vGED23VWk0ivUg2IicBaM3vazJZ13YIsTMI3HK5FPRjZkQg/mj+fdnfet26drhsh0otUm5juCLIIyUzDfRR1X+aOGcM9FRVcU1PDF7du5bZZs8IuSSTjpHqa6++BOiAn8Xg58FKAdUkGGMkBAfAPkyfznkmTuKOujj83NYVdjkjGSfUspg8Sv0bDfYlF04GfBVWUZIbhPs1Gf8yMeysrmZmfz5K1a2lI/HtFJC7VPogbgTeQuKqbu28EdPrHCLc7GqU0K4uCYTzNRn9KsrN54vTTaWhvZ8natXRoQj+RY1INiDZ3P/bnVWKwnHr2RriRMoq6P+cUF/O9ykqePXiQT2/ZEnY5Ihkj1YD4vZl9Gigws7cBPwF+EVxZkglG0iC5/lwzZQo3TJvGV7Zv5yd794ZdjkhGSDUgbgEagFeBDxGfwvu2oIqSzDCaAgLiE/q9vqSE99fUsObIkbDLEQldqmcxdRLvlL7B3S939//QqOqRb7QFRG4kwhOnn05RVhbvXL2aA4mr6YmMVn0GhMXdYWb7gBpgvZk1mNln+9pPhr+WWIxDsdiIPYOpN9Py8njyjDPY2trKezSITka5/o4gPk787KXz3H2Cu48HzgfeYGb/r78XN7OLzWy9mdWa2S1J1ueZ2WOJ9S+Y2axu6840s+fNbI2ZvWpm+QP6l8lJ2TPCx0D05a9KS7mnooJfNTbymc2bwy5HJDT9BcTVwFXufuzUDnffDLwvsa5XZpZF/NrSlwALgKvMbEGPza4DDrj7XOBu4EuJfbOJX870n9z9dOBNgI7302ikD5LrzwenTeOfpk3jS9u385g6rWWU6i8gctx9X8+F7t7AXy5D2puFQK27b06cIvsoJ15kaDHwYOLxE8BbLH7psouAVe7+SuL99rt7rJ/3kyE02gMC4Jtz53JBaSnvr6nh5cOHwy5HJO36C4i+hpb2N+x0OrC92/P6xLKk27h7B9AETAAqAU9MDviSmf1zsjcws+vNrNrMqhsaGvopRwZCAfGXTuvx2dm8c/Vq9mmktYwy/QXEa83sUJLbYeA1/eyb7CLGPXv8etsmm/jFid6buL/MzN5ywobuS929yt2rysrK+ilHBmJPNIoBZSNsJteBmpyby8/OOIPd0SjvXruWdo20llGkz4Bw9yx3L0lyK3b3/n456oEZ3Z6XAzt72ybR71AKNCaW/97d97l7C/FxF+ek/s+Sk7U7GqUsJ4fsSKpDZUauqpISls6bx28PHuRTmzaFXY5I2gT5v385UGFms80sF1gC9LyGxDLgmsTjy4FnE+MrngbONLMxieB4I7A2wFqlh9E2BqI/V0+ZwsfLy/nmjh08uHt32OWIpEWq14MYMHfvMLObiP/YZwEPuPsaM7sTqHb3ZcD9wMNmVkv8yGFJYt8DZvZ14iHjwFPu/sugapUTKSBO9JU5c1jV3MyH1q9n/pgxLCwpCbskkUDZSBkQXVVV5dXV1WGXMWLMev55Fo0dy0Pz54ddSkbZF41y3ksv0d7ZSfW55zIlLy/skkROipmtcPeqZOvUwCwncHd2RaNM1RHECSYmOq0PdHRw+Zo16rSWEU0BISfY395O1J3p+us4qdcWFfHAaafxp0OHuL2uLuxyRAKjgJAT7Eyc7z9NRxC9unLSJD4wdSp3bdvGbw8cCLsckUAoIOQEO9ragPjEddK7b8ydS2VBAf+wbh37NfOrjEAKCDlB1xGEmpj6VpiVxSMLFtDQ3s4H1q9npJzwIdJFASEn6DqCUCd1/84uLuauOXP42b593Lez5zhQkeFNASEn2NnWRllODrkaRZ2Sj5WXc9G4cXxy0yY2HT0adjkiQ0a/AHKCHdGoOqgHIGLG/fPmkWPG+2tq6FRTk4wQCgg5wc62NvU/DFB5fj7frKjguaYmvlVfH3Y5IkNCASEn2NHWpjOYBuHqyZP52wkTuHXLFja0tIRdjshJU0DIcdo7O9nb3s50NTENmJmxtLKSgkiEa2tqdD1rGfYUEHKc3dEojsZADNbUvDy+U1HB84cO8fXt2/vfQSSDKSDkOBoDcfKumjSJyyZO5F+3bGHtkSNhlyMyaAoIOc6xUdRqYho0M+PeykqKsrK4tqaGDk3oJ8OUAkKOs1PTbAyJybm5fLeykuWHD/NVNTXJMKWAkOPsiEbJNhv116IeCu+eNInLy8q4va6O1c3NYZcjMmAKCDnOzrY2pubmEjELu5QR4bsVFZRmZ3NtTY2uHSHDjgJCjrOjrU39D0OoLDeXeysrWdHczJe2bQu7HJEBCTQgzOxiM1tvZrVmdkuS9Xlm9lhi/QtmNqvH+plm1mxmNwdZp/zF9rY2Zubnh13GiPKusjKWTJrEnVu3skpNTTKMBBYQZpYF3ANcAiwArjKzBT02uw444O5zgbuBL/VYfzfwP0HVKMdzd7a1tTFTHdRD7ttz5zIuO5tr1NQkw0iQRxALgVp33+zuUeBRYHGPbRYDDyYePwG8xSze+G1m7wQ2A2sCrFG6aWhvp7Wzk1N0BDHkJubm8r3KSl5ububf1dQkw0SQATEd6H5+X31iWdJt3L0DaAImmFkh8C/A5/p6AzO73syqzay6oaFhyAofrba1tgKoiSkgl5WV8d5Jk/i3rVtZefhw2OWI9CvIgEh2GkzPyWl62+ZzwN3u3meDrbsvdfcqd68qKysbZJnSZVtiDISamILzrYoKJubkcG1NDVE1NUmGCzIg6oEZ3Z6XAz0vuXVsGzPLBkqBRuB84MtmVgd8HPi0md0UYK3CX44g1MQUnPE5OSytrGTVkSN8fuvWsMsR6VOQAbEcqDCz2WaWCywBlvXYZhlwTeLx5cCzHnehu89y91nAN4B/d/fvBFirAFvb2iiMRBiXnR12KSPa302cyDWTJ/PFrVtZfuhQ2OWI9CqwgEj0KdwEPA2sAx539zVmdqeZXZrY7H7ifQ61wCeAE06FlfTZ1trKzPx8TIPkAveNuXOZmpfH1TU1HI3Fwi5HJKlA/1R096eAp3os+2y3x63AFf28xh2BFCcn2NbWpualNBmbk8P98+bx9lWruG3LFr42d27YJYmcQCOp5Zhtra3qoE6ji8aP58PTpnF3fT1/OHgw7HJETqCAEACOxmLsbW/XKa5p9uU5c5iTn8+1NTU0d3SEXY7IcRQQAsSn2AA4RUcQaVWUnc0PTjuNutZWbt60KexyRI6jgBAAtmqQXGguGDuWT86YwX27dvF0Y2PY5Ygco4AQADYnAmKOAiIUn581iwVjxnBdTQ0H2tvDLkcEUEBIQu3Ro+SZ6UpyIcnPyuKh+fPZHY3ysdrasMsRARQQkrDp6FHmFBToQkEhOre4mNtOOYWH9+zhp5pbTDKAAkKAeECcWlAQdhmj3mdOOYWzi4r40IYN7I1Gwy5HRjkFhODu8YBQ/0PociIRHjrtNJo6Ovjwhg2495zfUiR9FBDC3vZ2jnR26ggiQ5xRVMTnZ8/myX37eGTv3rDLkVFMASFsOnoUQAGRQT45YwbnFxfzkY0b2aOmJgmJAkIUEBkoy4zvn3YaR2IxblBTk4REASHUHj1KBJilPoiMMr+wkDtmzeLJffv4ic5qkhAoIIR1LS3Mzs8nL6KvQ6a5ecYMqoqLuXHjRhrU1CRppl8EYV1LC/MLC8MuQ5LIjkT4/rx5NHV08JGNG8MuR0YZBcQo19HZyYaWFuaPGRN2KdKLM4qKuH3WLB5raOBJNTVJGikgRrktra1E3RUQGe6fZ8zg7KIiPrxhA/s1V5OkSaABYWYXm9l6M6s1sxMuJ2pmeWb2WGL9C2Y2K7H8bWa2wsxeTdz/dZB1jmbrWloAFBAZLicS4QennUZjRwcfVVOTpElgAWFmWcA9wCXAAuAqM1vQY7PrgAPuPhe4G/hSYvk+4O/c/TXANcDDQdU52h0LCPVBZLwzi4q47ZRT+PHevfx8376wy5FRIMgjiIVArbtvdvco8CiwuMc2i4EHE4+fAN5iZubuK919Z2L5GiDfzDTNaABqWlqYmptLaXaglyeXIXLrzJm8trCQ69evZ1fiIk8iQQkyIKYD27s9r08sS7qNu3cATcCEHtu8C1jp7if8bzCz682s2syqG9R5NyivNjdzuo4eho3cSIQfL1hAcyzGkrVr6ejsDLskGcGCDIhk80b3HA7a5zZmdjrxZqcPJXsDd1/q7lXuXlVWVjboQker9s5OXj1yhLOLisIuRQZgQWEh91VW8oemJm7bsiXscmQECzIg6oEZ3Z6XAzt728bMsoFSoDHxvBz4KXC1u+tivQFY29JC1F0BMQy9b8oUPjR1Kl/avp2Hdu8OuxwZoYIMiOVAhZnNNrNcYAmwrMc2y4h3QgNcDjzr7m5mY4FfAre6+58CrHFUW3n4MIACYpj6VkUFfz12LNetX88zupa1BCCwgEj0KdwEPA2sAx539zVmdqeZXZrY7H5ggpnVAp8Auk6FvQmYC/yrmb2cuE0KqtbRamVzM2MiESp0iuuwlBuJ8OQZZzB/zBgWr17NswcOhF2SjDA2UmaJrKqq8urq6rDLGFYWrVxJhzt/PuecsEuRk7AnGuWtr7zCxpYWHjv9dBZPnBh2STKMmNkKd69Ktk4jqUepmDsrm5s5R81Lw97k3Fx+d9ZZnFlUxDtXr+b2LVuIjZA//CRcCohRalVzM82xGH9VWhp2KTIEJuTk8PuzzuLaKVO4c+tWFq1cyerm5rDLkmFOATFKPdfUBMCFCogRoyAriwfmzeOh005jfUsLZ69YwYc3bGB7a2vYpckwpYAYpYZAryMAAAnzSURBVP7Y1MTMvDxm6CJBI4qZ8Q9TplCzcCEfnDqV+3ftYu4LL3D9+vVsTEyrIpIqBcQo5O4819TEBTp6GLEm5uby3cpKas8/n+umTuWh3buZ9+KLXLFmDdWHDoVdngwTCohRqPboUXZHowqIUWBmfj7fraxk6+tfz60zZ/KbxkbOe+kl3vLyy/y6sVHXupY+KSBGoV8lBlW9bdy4kCuRdJmcm8sX5sxh2+tfz1fmzKGmpYW3r1rFuStW8OiePTrrSZJSQIxCTzU2UllQwFwNkBt1SrKzuXnmTDa/7nXcP28eLbEYV61bxwU660mSUECMMkdiMX574ADvmNBz0lwZTfIiEf5x6lTWLlzIQ6edRu3Ro5yzYgWfq6vTDLFyjAJilFm2bx9t7rxTo20FiCTOelp33nm8u6yMO+rqePMrr7BNp8YKCohR55G9eynPy9P4BznOxNxcfrhgAT+aP59Xmps5q7qan+oaK6OeAmIU2d3Wxq8aG7myrIyIJbsUh4x275k8mZVVVZxaUMDfr1nDDRs2cDQWC7ssCYkCYhS5b9cu2t25ftq0sEuRDHZqQQF/OvtsPjVjBvfu3Ml5K1aoA3uUUkCMEkdiMe7dsYOLx4+nUmcvST9yIxG+fOqpPH3mmexrb+e8l17i3h07NG5ilFFAjBLfqq9nT3s7t51yStilyDBy0fjxvHLeebyxtJQbNm7kolWrNGXHKKKAGAW2HD3KF7Zu5dIJE3iDOqdlgCbn5vLUmWdyT0UFLx46xGuWL+e2zZs52N4edmkSMAXECNcai/G+deuImPHtioqwy5FhKmLGDdOnU7NwIe8qK+ML27Yx+4UXuGPLFna2tYVdngREATGCtcRiXLl2LX8+dIj7581jpmZulZM0NS+PHy1YwMpzz+WNpaV8butWZj7/PItffZWHdu9mXzQadokyhLKDfHEzuxj4JpAF/Ke739VjfR7wEHAusB+40t3rEutuBa4DYsBH3f3pIGsdaf6vqYkPb9zIK83N3FNRwRWTdElvGTpnFRfzs9e8htqWFpbu2sUP9+xh2f79RIAzi4pYWFxMVXExFQUFzC4ooDwvjyydWj3sBHZNajPLAjYAbwPqgeXAVe6+tts2NwBnuvs/mdkS4DJ3v9LMFgCPAAuBacAzQKW793pC9mi+JnVLLMaeaJQtra28cOgQP9u3jxcPH2Zabi73VVbytxo1LQHrdOelw4f5xf79PH/oEMsPH+ZgR8ex9VnA2OxsShO3gkiELLP4DcgyI9uMHDPyIhFyIxHyzOL3kQhjIhEKs7KO3Xd/PCYri8JIJH7f7XFuRA0kqejrmtRBHkEsBGrdfXOiiEeBxcDabtssBu5IPH4C+I6ZWWL5o+7eBmwxs9rE6z0/1EU2trdzwcqVdMWku//lcbf7riA9btlJ7nPCsl5er699Yu609Jg758zCQr5+6ql8YOpUirMDPUgUAeJ9FFUlJVSVlADx7+qW1lY2Hz1KXWsrW9vaaGxvp6mjg4MdHbR2dhIj/v2NuhNzp8OddnfaOjuJdrtv7eykJRZjoDNEZZsdC5H8SISu45dj92YnLuu2LtnyTHVmURGPLFgw5K8b5K/HdGB7t+f1wPm9bePuHWbWBExILP+/HvtO7/kGZnY9cD3AzJkzB1VkthlnFBbGX6/rdUn+BUn25envSzeYfVKtoev5xJwcpuTmUp6Xx7nFxYzPyUn53y8SBDNjTkEBcwoKhuT1PBEkR2IxWmIxjiRC40gsRktn53H3xy1LbNua+COqvz/MSLZ8SP4FwZodUP9ikAGRLHR7fta9bZPKvrj7UmApxJuYBlogxKc/fvz00wezq4ikiZmRl2h+0h9A6RNkI109MKPb83JgZ2/bmFk2UAo0priviIgEKMiAWA5UmNlsM8sFlgDLemyzDLgm8fhy4FmPH+ctA5aYWZ6ZzQYqgBcDrFVERHoIrIkp0adwE/A08ZMYHnD3NWZ2J1Dt7suA+4GHE53QjcRDhMR2jxPv0O4AbuzrDCYRERl6gZ3mmm6j+TRXEZHB6us0V50oLCIiSSkgREQkKQWEiIgkpYAQEZGkRkwntZk1AFuBicC+kMsZKNWcHsOt5uFWL6jmdBnKmk9x97JkK0ZMQHQxs+reeuQzlWpOj+FW83CrF1RzuqSrZjUxiYhIUgoIERFJaiQGxNKwCxgE1Zwew63m4VYvqOZ0SUvNI64PQkREhsZIPIIQEZEhoIAQEZGkRkxAmNkVZrbGzDrNrKrHulvNrNbM1pvZ28OqsS9mdoeZ7TCzlxO3d4RdUzJmdnHic6w1s1vCricVZlZnZq8mPteMnNHRzB4ws71mtrrbsvFm9hsz25i4HxdmjT31UnPGfo/NbIaZ/dbM1iV+Kz6WWJ6xn3MfNaflcx4xfRBmNh/oBO4Dbnb36sTyBcAjxK9pPQ14BqjMtOnDzewOoNndvxp2Lb0xsyxgA/A24hd1Wg5c5e5r+9wxZGZWB1S5e8YOhjKzRUAz8JC7n5FY9mWg0d3vSoTxOHf/lzDr7K6Xmu8gQ7/HZjYVmOruL5lZMbACeCdwLRn6OfdR87tJw+c8Yo4g3H2du69Psmox8Ki7t7n7FqCWeFjIwC0Eat19s7tHgUeJf75yktz9D8SvidLdYuDBxOMHif8wZIxeas5Y7r7L3V9KPD4MrCN+rfuM/Zz7qDktRkxA9GE6sL3b83rS+AEP0E1mtipx6J4xh7ndDKfPsjsHfm1mK8zs+rCLGYDJ7r4L4j8UwKSQ60lVpn+PMbNZwNnACwyTz7lHzZCGz3lYBYSZPWNmq5Pc+vor1pIsC6VdrZ/67wVOBc4CdgFfC6PGfmTMZzlAb3D3c4BLgBsTTSMSjIz/HptZEfBfwMfd/VDY9aQiSc1p+ZwDu+RoENz9rYPYrR6Y0e15ObBzaCoamFTrN7P/AP474HIGI2M+y4Fw952J+71m9lPiTWV/CLeqlOwxs6nuvivRFr037IL64+57uh5n4vfYzHKI/9D+yN2fTCzO6M85Wc3p+pyH1RHEIC0DlphZnpnNBiqAF0Ou6QSJL2aXy4DVvW0bouVAhZnNNrNc4tcQXxZyTX0ys8JE5x5mVghcRGZ+tsksA65JPL4G+HmItaQkk7/HZmbA/cA6d/96t1UZ+zn3VnO6PueRdBbTZcC3gTLgIPCyu789se4zwD8CHcQP0f4ntEJ7YWYPEz9cdKAO+FBXu2gmSZxO9w0gC3jA3b8Qckl9MrM5wE8TT7OBH2dizWb2CPAm4tM47wFuB34GPA7MBLYBV7h7xnQK91Lzm8jQ77GZXQA8B7xK/IxHgE8Tb9PPyM+5j5qvIg2f84gJCBERGVqjoYlJREQGQQEhIiJJKSBERCQpBYSIiCSlgBARkaQUECIikpQCQkREkvr/JgoDjqDQcA0AAAAASUVORK5CYII=", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df['Pregnancies'].plot.density(color='c')" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9016739791518588" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Pregnancies'].skew()" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAD4CAYAAADhNOGaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3deXxc5X3v8c9vZrTNpsWSF+RF3rGRZdkIY9ZACglkMU0CDTRpQgOF3oYmaZre0pRLKLlt8iJt2qYhDaShtyEJJClpcCgUCIEABYzlBe+2hLxbtiRb1jKLpJl57h8aucLI1liaM+fMnN/79dKLWc7M/MZI851nOc8jxhiUUkq5l8fuApRSStlLg0AppVxOg0AppVxOg0AppVxOg0AppVzOZ3cB56q6utrU1dXZXYZSSuWVDRs2dBljasa6L++CoK6ujubmZrvLUEqpvCIi+890n3YNKaWUy2kQKKWUy2kQKKWUy2kQKKWUy2kQKKWUy2kQKKWUy2kQKKWUy+XdeQRKucWJoSH+u6eHtnic3kSCoNfLIr+fy8vLKffpn67KHv1tUspB+hIJHu/o4JGjR1nX28tYu4WUiHBjTQ331dWxwO/PeY2q8GgQKOUA0WSSBw8f5usHDnAikaA+EOC+ujqurqhgid9Puc9HXzLJlv5+ft7Vxffb2/lZZycPzJ/P52prERG734LKYxoEStnsmePHuXPPHg4ODHBdVRX/Z84cLgmH3/XhXuXxcFVlJVdVVvLl2bO5Y88evtDays5IhO8sWoRHw0BNkAaBUjY5OTTE51tb+cGxYyz1+3mpsZH3VFRk9NjpJSU8WV/Pl/fu5esHDuAV4dsLF2rLQE2IBoFSNnirv5+PbdvG/oEB7pkzh3vmzKHEc26T+ESEr82bR8IY/vbgQZYGAny2ttaiilUh0yBQKsd+cPQod+7ZQ5XPx0uNjVxWXj6p5/v6vHnsikb5YmsrV5aXsywYzFKlyi30PAKlcsQYwz1tbXx61y4uCYfZ2NQ06RAA8IrwyOLFVPh8fGrXLhKpVBaqVW6iQaBUDgylUty2ezd/feAAt02fznMNDUwrLs7a89cUF/OdRYvY3N/Pd48cydrzKnfQIFDKYpFkkhu2beNfjx7lK3Pm8L3Fi/Gd43hAJj5aXc1vVVRw7759nBgayvrzq8KlQaCUhToGB7l682aePXGChxYt4r65cy2b2SMifHPBAroTCf7h0CFLXkMVJg0CpSzydizGZZs2sS0S4T/q67njvPMsf82GYJCPVlfzrUOH6EkkLH89VRg0CJSywPreXi7ZuJETQ0O8sHw5a6qrc/ba98yZQ08yybcPH87Za6r8pkGgVJb95/HjXLV5M0Gvl9dWruSSLMwMOhcrQiGur6ri24cPM6QziFQGNAiUyhJjDA8ePsyarVtZ4vfz2ooVLLZpUbjP1tZydHCQX3R12fL6Kr9oECiVBdFkklt37eKulhY+MGUKLzU2Mr2kxLZ6rquqoq60lH/WqaQqAxoESk3S+t5eLtqwgUePHeO+ujqerK8naPN+AV4R7pwxgxdPnmRXJGJrLcr5NAiUmqCOwUH+uKWF1Rs30pNI8ExDA1+pq3PMKqC3Tp+OB/jhsWN2l6IcTtcaUuocpIzhtZ4efnjsGD/q6CCWTHLneefxtXnzHLdr2PSSEq6prOTHHR181cLzF1T+s7RFICLXichuEWkVkbvPctyNImJEpMnKepSaqF2RCPe0tTF/3Tqu2LyZR48d46PV1WxftYrvLFrkuBAY8Ylp09gbj/N6b6/dpSgHs+y3V0S8wIPAtcAhYL2IrDXG7DjtuBDwOWCdVbUoNRH9iQSPdXTwvfZ21vf14QGuqazkq3V1/HZ1te3jAJn4SHU1f+jx8KNjx7g0x9NYVf6w8jd5FdBqjGkDEJHHgRuAHacd91XgAeBLFtaiVMZiySTfSm8beTK9beTfzZ/PLVOnMsPGmUATEfL5+NCUKfy8q4t/WrjQMeMXylms7BqqBQ6Oun4ofdspIrICmGWMeepsTyQid4hIs4g0d3Z2Zr9SpdKae3tZ0dzM3W1tXBYO898rVrClqYkvzpqVdyEw4obqao4ODvKmdg+pM7AyCMb66mFO3SniAf4e+NPxnsgY87AxpskY01RTU5PFElW+iiST/Lyzk/88fpzBLJ09+2RXF5du2kQkleK5hgaeamjg0vLyvB9k/UBVFV7gyePH7S5FOZSVXUOHgFmjrs8ERp/dEgLqgZfSf2jTgbUissYY02xhXSrPbY9EeP9bb3F4cBCAeaWlPLZ0KavC4Qk/5xOdndy8YwcXBoM83dBAVVFRtsq1XWVREe+pqGBtVxdfmzfP7nKUA1nZIlgPLBSRuSJSDNwMrB250xjTY4ypNsbUGWPqgDcADQF1VtFkko9u20YKeL6hgSfr6zHAezZv5qXu7gk95+PHjvHx7du5OBTiueXLCyoERtxQXc2OaJTWaNTuUpQDWRYExpgEcBfwLLAT+KkxZruI3C8ia6x6XVXYvn7gAHtiMX60ZAnXVFWxprqadStXMq+0lDXbtrGpr++cnu/Ro0f5xM6dXF5ezn81NBDOg5lAE7FmyhRAu4fU2MQYM/5RDtLU1GSam7XR4Eb9iQSz33iDqysqeKK+/h33HYrHuWzTJgZSKV5buZJ5ZWXjPt8j7e3cvns3v1VZyZP19fi9XqtKd4Rl69czraiIXzU22l2KsoGIbDDGjHmuli4xofLGo8eO0Z1I8GezZr3rvpmlpfxXQwODxnDdli10pscPzuS7hw9z2+7dvL+qirUuCAGA91VW8mpPD7Fk0u5SlMNoEKi88eOODi7w+1l9hhOjlgQCPLVsGQcHBvjg1q1j7ttrjOH+ffv4Xy0tfHjKFH5RX0+ZC0IA4NrKSgaM4ZWeHrtLUQ6jQaDywuGBAV7t6eHjU6ee9bhLy8v56dKlvNXfz+qNG1k/au78wXicj2zbxlf27ePW6dN54oILKLFgE3mnurKigmIRnjtxwu5SlMMU5siYKjhr0xus3JTBeSQfrq7mheXLuWnHDlZt3MiyQIAyj4eN/f34RPi7+fP5k5kz8/78gHPl93q5rLyc5yc4u0oVLvd8HVJ57Vfd3cwuKcl4x6/LKyrYvWoV35g3j9qSEoJeL1+aNYudF13EF2fNcl0IjHhfZSVbIhGOjTOGotxFWwTK8ZLG8OuTJ/lYdfU5fYCHfT6+NHs2X5o928Lq8su1VVX8xd69/Kq7m09Mm2Z3OcohtEWgHG9jXx8nEwmuqay0u5S8tyIYpMLn4zcnT9pdinIQDQLleK+lB3yvqKiwuZL85xHhsnBYZw6pd9AgUI63rreXmSUl1Obp6p9Oc2VFBbuiUTp0nEClaRAox1vX28vFoZDdZRSMK9LnYbyqrQKVpkGgHK1zcJC2eJyLJ7GyqHqnC0MhyjweXtZxApWmQaAcbV16fECDIHuKPR5W6ziBGkWDQDlac3qv4Au1ayirriwvZ3N/P72JhN2lKAfQIFCOtiUSYWFZGQGXrAeUK1dUVJACXtNWgUKDQDnclv5+GoJBu8soOKvDYXwivKxBoNAgUA7Wn0jwdjxOQyBgdykFJ+D1sjwQODUGo9xNg0A51rZIBIBl2iKwxMXhMG/29ZHMs82pVPZpECjH2poOAm0RWGN1OEx/MsnO9L+zci8NAuVYWyIRQl4vc0pL7S6lII1MyV13jvs8q8KjQaAca3skwlK/H49Ll4y22sKyMip9Ph0nUBoEyrl2R6Ocn+H+A+rciQirQiHe0CBwPQ0C5Uh9iQRHBgcz3ohGTczqcJjtkQj9emKZq2kQKEdqicUAWKRBYKmLw2FSDJ/BrdxLg0A50u5oFIDFZWU2V1LYVumAsUKDQDnU7mgUARZoEFhqSlERC8rKdJzA5TQIlCPticWoKy2lVNcYstzFoRDrensxemKZa2kQKEfaHY2ySFsDObEqHKZ9cJB23bHMtTQIlOMYY9gTi+mMoRwZWeJ7g44TuJYGgXKc9sFB+pNJDYIcWR4IIMDG/n67S1E20SBQjjMyY0i7hnIj6PNxvt+vLQIX0yBQjnNq6qi2CHLmwlBIg8DFNAiU47TEYpR5PNSWlNhdimusDAY5MjjI0YEBu0tRNtAgUI7TFo8zt7RUF5vLoZEBYx0ncCcNAuU4e2Mx5un4QE6tSG/+s1G7h1xJg0A5ijGGtniceboHQU6FfD4WlZWxQVsErmRpEIjIdSKyW0RaReTuMe7/QxHZKiKbReRVEVlqZT3K+Y4PDdGXTDJXgyDndMDYvSwLAhHxAg8C1wNLgVvG+KD/sTFmmTGmEXgA+KZV9aj8sDceB9CuIRusDAY5ODBAp55h7DpWtghWAa3GmDZjzCDwOHDD6AOMMaNXugoAutiJy7Wlg0BbBLmnA8buZWUQ1AIHR10/lL7tHUTksyLyNsMtgs+N9UQicoeINItIc2dnpyXFKmdoS+9DoEGQeyMDxto95D5WBsFYc//e9Y3fGPOgMWY+8OfAPWM9kTHmYWNMkzGmqaamJstlKifZG48ztaiIoM9ndymuU1FUxPzSUg0CF7IyCA4Bs0ZdnwkcOcvxjwO/bWE9Kg+0xWLaGrDRylCIt7RryHWsDIL1wEIRmSsixcDNwNrRB4jIwlFXPwi0WFiPygN743EdKLbR8mCQt+NxenUPY1exLAiMMQngLuBZYCfwU2PMdhG5X0TWpA+7S0S2i8hm4IvAp62qRzlfIpVif/qsYmWPxvQ4wRZtFbiKpR2xxpingadPu+3eUZc/b+Xrq/xycGCAJDp11E7LAwEA3opEuLyiwuZqVK7omcVqTBv7+rh/3z7e6OnJ2WueOodAWwS2qS0pYYrPx2ZtEbiKBoF6l193d7N640a+sm8fl2/axNqurpy8rk4dtZ+I0BgMahC4jAaBeoeBVIrbd+9mXmkpbRdfzPJgkNt3787J4GFbPI5PhJm6/LStlgeDbItESKRSdpeickSDQL3Dzzo62BuP8/cLFjC3rIzvLlpE59AQ3z1ytpm/2bE3HmdOSQk+j/5a2qkxGCSeSrEn3UJThU//4tQ7fOfIERaWlfH+qioALgqHuaqigu+1t2OMtSuAtMVizNWBYtstT88c0vMJ3EODQJ3SFovxem8vfzBjxjs2hfnUtGm0xmKs6+09y6Oz8Pq6/LQjnO/3Uyyi4wQuokGgTvnl8eMAfPS0ZTw+VlNDqcfD4x0dlr12XyJB19CQDhQ7QLHHwwWBgAaBi2gQqFPWdnWx1O9n/mndM2Gfj6srKnj6xAnLXluXn3aW5cGgdg25iAaBAiCaTPJKTw8fnDJlzPuvr6qiJRbjbYsGEEemjmrXkDM0BoMcGxrSzexdQoNAAbC+r48hY3jPGc4mvS49ePxfFrUKRloEOljsDCNLTWj3kDtoECgAXjl5EoBLw+Ex71/o9zOrpOTUcdnWFo8T8nqp0uWnHaFh1FITqvBpECgAXu3poT4QoLKo6IzHXF5ezis9PZZMI90bizGvtBSRsbaxULlWWVTEnJISbRG4REZBICJPiMgHRUSDowAljeG13l4uLy8/63GXl5dzZHCQ/elunGxq0+WnHWe5LjXhGpl+sP8z8LtAi4h8XUTOt7AmlWPbIxH6kkkuO0O30IjL0kHxapYXojPGDO9DoAPFjtIYDLInGiWaTNpdirJYRkFgjPmVMeYTwEpgH/C8iLwmIr8vImfuS1B5YWN6a8Km9OblZ1IfCBD2erMeBEcHB4mnUjpQ7DCNwSApYJuOExS8jLt6RGQKcCtwO7AJ+EeGg+F5SypTObOpvx+/x8NCv/+sx3lFuDgc5s0s72nbpstPO9JynTnkGpmOEfwceAXwAx82xqwxxvzEGPPHQNDKApX1NvX3szwYxJvBQO2FoRDbIhHiWewu2KvLTztSXWkpYa9XTyxzgUxbBP9ijFlqjPmaMaYdQERKAIwxTZZVpyyXMobN/f2sDGaW5xcGgwwZw9YsdheMtAjqNAgcxSNCgw4Yu0KmQfB/x7jt9WwWouzxdixGXzLJinHGB0aMjCNsyGL3UFssxnnFxZR6vVl7TpUdjcEgWyIRUhavPKvsddazd0RkOlALlInICmCk7yDMcDeRynMj3/ZWZNgimFNaSpXPx4Ysfkvcq1NHHasxGKQ/maQtFmPBOGNIKn+Ndxrn+xkeIJ4JfHPU7X3Aly2qSeXQ1kgED7A0wz9yEeHCUCi7LYJ4nPfqRumONLKZ/eb+fg2CAnbWIDDG/BvwbyLyMWPMEzmqSeXQjkiE+WVl59Qtc2EoxN8ePEg8mZx0d85AKsXhgQEdKHaoCwIBvAwvNXGj3cUoy4zXNfRJY8wPgToR+eLp9xtjvjnGw1Qe2RGNckH6W1+mLgwGSaQHjC8a5yS08eyPxzHo8tNOVeb1stjv1wHjAjfeYPHIJ0QQCI3xo/LYYCpFSyyWcbfQiJXpAeNNWfhwOLXqqLYIHKtR9yYoeON1DT2U/u9f5aYclUstsRgJY1h6ji2CuaWllHu9WQmCU/sQaIvAsRqDQX7c0cHxoSGmnGVRQpW/Mj2h7AERCYtIkYi8ICJdIvJJq4tT1tqePhfggnNsEYgIjVmaX94Wj1Miwozi4kk/l7KGbmZf+DI9j+B9xphe4EPAIWAR8GeWVaVyYkckggCLJzAbpDEYZEt/P8lJzi/fG4tRV1qKR5efdiwNgsKXaRCMtAc/ADxmjLFu81qVMzuiUeaVllI2gZk/K0IhoqkUe6LRSdWgy08737TiYqYXF+uAcQHLNAh+KSK7gCbgBRGpAbK/KL3Kqe2RyDnPGBoxcgLaZMcJ9sbjOlCcB3TAuLBlugz13cAlQJMxZgiIADdYWZiyVtIYWmIxzp/gSUJL/H5KRCYVBMeHhjiZSDBfWwSO1xgMsiMaZTCVsrsUZYFz2SB2CcPnE4x+zA+yXI/Kkf3xOEPGsHCCH8JFHg/1gcCkugta0t1KE61B5c7yQIAhY9gRidCY4bpUKn9kFAQi8igwH9gMjKw/bNAgyFst6WmbiyaxbEBjMMgvurowxkxor+E9WahB5UbjyICxBkFByrRF0AQsNVbsWq5skY1v4ytCIb5/9CiHBgaYNYF+/pZYDA96Mlk+WOj3U+bxsLm/n0/bXYzKukwHi7cB060sROVWSyxG0Otl+iTm7092wHhPNMrc0lKKPRlvlKds4hVhWSCgA8YFKtO/wGpgh4g8KyJrR36sLExZa08sxoKysgl16YxoCAQQJh4ELbHYuNtjKucYOYlQOwYKT6ZdQ/dN5MlF5DqG9zb2MrzL2ddPu/+LDO+BnAA6gc8YY/ZP5LXUuWmJRrlwkn29QZ+PRWVlExowNulZS1eUl0+qBpU7y4NBHm5vn3BXoHKuTKeP/gbYBxSlL68HNp7tMSLiBR4ErgeWAreIyNLTDtvE8JTUBuDfgQfOqXo1IUOpFPvi8azM1lkRCrFpAnsTHB0cpD+Z1BlDeaRRN7MvWJmuNfQHDH9QP5S+qRb4xTgPWwW0GmPajDGDwOOcdu6BMeZFY8zIqalvMLwBjrLY3nicJGSlW6YxGGT/wAAnhobO6XHZmLWkcmtZuitQg6DwZDpG8FngMqAXwBjTAkwd5zG1wMFR1w+lbzuT24BnxrpDRO4QkWYRae7s7MywZHUme7I4f3/FBL8lZrMGlRshn4/5ZWU6YFyAMg2CgfS3egDSJ5WNN2I01ijkmI9Jr2TaBHxjrPuNMQ8bY5qMMU01NTUZlqzO5NS38SwGwbkOGO+JxSgWYbb2NeeVbK06q5wl0yD4jYh8meFN7K8Ffgb8cpzHHAJmjbo+Ezhy+kEicg3wl8AaY8xAhvWoSWiJxajw+bKytnxNcTG1E1iQbHskwvl+P15ddTSvLA8EeDsepy+RsLsUlUWZBsHdDM/q2QrcCTwN3DPOY9YDC0VkrogUAzcD75hyKiIrGB53WGOM6TiXwtXEtcRiLJzk1NHRJjJgPJkF75R9RgaMt6T3slCFIdNZQymGB4f/yBhzozHme+OdZWyMSQB3Ac8CO4GfGmO2i8j9IrImfdg3GN4G82cislnPTciNlmg0q33zjcEgu6JRYsnk+AcD/YkE+wcGNAjykM4cKkzjbV4vwFcY/kCX9E1J4J+MMfeP9+TGmKcZbj2Mvu3eUZevmUjRauLiySQHBgb4/SzO1lkRDJIEtkYirMpgM/sd6YHic90ZTdmvtqSEKp9PB4wLzHgtgi8wPFvoImPMFGNMFXAxcJmI/Inl1amsezsex5Dd2TrnOmA8skVmvbYI8k42tylVzjFeEHwKuMUYs3fkBmNMG/DJ9H0qz1ix9HNdaSkVPl/GHw7bIhFKPR7m6tTRvLQ8GGRrJEJC9yYoGOMFQZExpuv0G40xnfzP9pUqj4xMHc1mEIx8S8x0wHh7JMISnTGUtxqDQeKp1KnfJZX/xguCwQnepxxqTyxGTVERFVmYOjraimCQLZHIuJvZG2PY0N9/qjtJ5R8dMC484wXBchHpHeOnD1iWiwJVdmV7xtCIxmCQWCrF7nE2sz8wMEDX0BBNurlJ3jrf76dIRAeMC8hZg8AY4zXGhMf4CRljtGsoD42cQ5BtI9/wN47TPdScvl+DIH8VezxcMMltSpWz6I4gLtKfSHBkcNCShd6WBgKEvV5e7ek563HNfX0UidCgXUN5bUUwyEbdm6BgaBC4SKsFA8UjvCJcUV7OSydPnvW49b29LAsEKNFdyfJaUyhE59AQBwd0VZhCoH+NLmLFjKHRrqqoYHcsRvsZPhxSxtDc1zfpDXGU/Ub+H26YwF4Uynk0CFxkJAgWWBQE76moAODlM3QPvdXfT08yqbuSFYCGQACfyKkxH5XfNAhcpCUWY0ZxMUFfpjuUnpsVwSAhr5dfd3ePef+L6W6jq9OBofJXmdfLBX6/tggKhAaBi1g1dXSEz+Ph2spKnjp+nNQYg4i/7u5mYVkZM3UPgoLQFArR3NenA8YFQIPARayaOjrab1dXc2Rw8F3fFKPJJC+dPMk1lZWWvr7KnaZQiOOJBAd0wDjvaRC4RG8iQcfQUFb2KT6bD06Zghf499O2FH3mxAkiqRQ36g5zBWNkwFjHCfKfBoFLWD1jaERVUREfmjKFfz16lIFRi5I93tHB1KIirtSB4oLREAxSJKLjBAVAg8AlrFh19Ez+qLaWzqEhftIxvOncgXicX3R18Ylp0/Dp+QMFo8TjoT4Q0BZBAdC/SpcYaRHMz0EQXFNZyYpgkC+3tdE+MMCftLbiAb4wc6blr61yqykUYoMOGOc9DQKXaInFmFlSgt/rtfy1PCI8vGgRJxIJznv9dX7e1cVfz53LbJ0tVHAuDIU4kUiwLx63uxQ1CdZMKFeOk4sZQ6M1hcO8umIF/3b0KKvDYW6eOjVnr61yp2nUgLFuNJS/NAhcoiUa5WM5nrGzMhRipS4nUdDqAwGK0wPGN2nY5y3tGnKBE0NDHE8kctoiUO5Q4vGwTAeM854GgQucmjpq8TkEyp0uCodZ39c35tnkKj9oELhALqeOKvdZHQ7Tm0yya5zd6ZRzaRC4QEsshgDzdNaOssAl4TAAr/f22lyJmigNAhdoicWYU1pKaQ6mjir3WVhWRqXPxxsaBHlLg8AFcj11VLmLiLA6HOb1cbYpVc6lQVDgjDGWLz+t1CXhMDuiUXoSCbtLUROgQVDgOoaG6EkmLdmwXqkRq8NhDPCmdg/lJQ2CAjcyk2OJBoGy0KpwGAEdJ8hTGgQFbmckAsD5GgTKQuU+H0v8fg2CPKVBUOB2RaMEPB5mlZTYXYoqcJeEw7zR26srkeYhDYICtzMa5Xy/HxGxuxRV4FaHw5xIJNiTPpNd5Q8NggI3EgRKWe3S9O5z/63TSPOOBkEB608kODgwwJJAwO5SlAss8fupLiri5ZMn7S5FnSMNggK2O91E1xlDKhdEhCvKy3lZWwR5x9IgEJHrRGS3iLSKyN1j3H+liGwUkYSI3GhlLW40MnVUu4ZUrlxZXs7eeJyDumNZXrEsCETECzwIXA8sBW4RkaWnHXYAuBX4sVV1uNnOSAQvsEDPKlY5cmVFBQCvaKsgr1jZIlgFtBpj2owxg8DjwA2jDzDG7DPGbAFSFtbhWruiURaUlVHs0R5AlRvLg0HCXq+OE+QZKz8haoGDo64fSt+mcmSHzhhSOeYV4XIdJ8g7VgbBWBPXJ3SmiYjcISLNItLc2dk5ybLcYSCVYk80yrJg0O5SlMtcWVHBzmiUjsFBu0tRGbIyCA4Bs0ZdnwkcmcgTGWMeNsY0GWOaanK8AXu+2hmJkASW6dRRlWNXps8n0HGC/GFlEKwHForIXBEpBm4G1lr4emqULek1hho0CFSONYVCBL1eXujutrsUlSHLgsAYkwDuAp4FdgI/NcZsF5H7RWQNgIhcJCKHgJuAh0Rku1X1uM2W/n5KRHTGkMq5Io+HqyoqeF6DIG/4rHxyY8zTwNOn3XbvqMvrGe4yUlm2NRLhgkAAn84YUja4trKSp44fZ28sxlz9MuJ4+ilRoLZEIjo+oGzzvspKAG0V5AkNggLUOTjI0cFBGnTGkLLJYr+fmSUlGgR5QoOgAG1NDxRri0DZRUS4trKSF7q7Ser+BI6nQVCANvX3A8NneSpll2srK+lOJNjQ12d3KWocGgQFqLmvj1klJUwtLra7FOVi16THCZ47ccLmStR4NAgKUHNfH02hkN1lKJerKS5mVSjEL48ft7sUNQ4NggLTPTREayzGRRoEygHWVFfzZl8f7QMDdpeizkKDoMBsTI8PaItAOcGaKVMAeEpbBY6mQVBgmtMDcxdqECgHqA8EmFtaypNdXXaXos5Cg6DANPf1Ma+0lKqiIrtLUQoRYc2UKfyqu5tIMml3OeoMNAgKzJu9vdoaUI6yprqaAWN4VmcPOZYGQQE5GI9zYGCAy9LLACvlBFeWl1NTVMRPOjrsLkWdgQZBARlZ//1KDQLlID6Ph5tqavjl8eP0JxJ2l6PGoEFQQF4+eZKw16trDCnHuWXqVGKpFE/q7CFH0iAoIK/09HBpeTleGWuXUKXsc2l5ObNKSnjs2DG7S1Fj0CAoEF2Dg+yIRrlCu4WUA3lEuHnqVJ7t7ub40JDd5ajTaBAUiJXg8bIAAAiqSURBVFd1fEA53O9Nm0bCGB49etTuUtRpNAgKxPPd3fg9Hi4Kh+0uRakxLQsGWR0O81B7O0aXpnYUDYICYIzhmRMneG9lJSW6NaVysDtnzGBXNMrL6Rascgb91CgArbEYe+Nxrq+qsrsUpc7qd6ZOpdzr5aEjR+wuRY2iQVAAnkmfsXmdBoFyOL/Xy6emT+eJzk5dkdRBNAgKwNquLhaXlTGvrMzuUpQa1+dqa0kYwz8eOmR3KSpNgyDPdQwO8uLJk9w0dardpSiVkQV+PzfW1PDPR47Qo2caO4IGQZ77j64uUsDv1NTYXYpSGfvz2bPpTSZ58PBhu0tRaBDkvZ90dLC4rIz6QMDuUpTK2MpQiA9NmcIDBw5wQk8ws50GQR57OxbjxZMn+cS0aYguK6HyzNfmzqU3meRv9u+3uxTX0yDIY987cgQvcNuMGXaXotQ5qw8G+fT06fzT4cO0RKN2l+NqGgR5aiCV4l+PHuXD1dWcV1JidzlKTcjfzJ1LmcfDnXv26NnGNtIgyFOPtLfTMTTEXbW1dpei1ITNKCnhgfnzefHkSb7f3m53Oa6lQZCHBlMpvnbgAJeEw7y3osLucpSalNtnzODqigo+19rK9kjE7nJcSYMgDz14+DAHBwa4d84cHSRWec8jwo+WLCHs9XLj9u16boENNAjyzOGBAe7dt48PVFXxfl1SQhWIGSUlPLZ0Ka2xGDds3Uo8mbS7JFfRIMgjSWP4zK5dJIzhWwsXamtAFZSrKyv5wfnn85ueHj6yfbvub5xDGgR55Ct79/JcdzffWrCA+bqukCpAt0ybxr8sXsxzJ07w3rfe4mA8bndJrqBBkAeMMfzN/v389YED3D5jBrfreQOqgN02Ywb/UV/PjkiExuZmHjt2TKeWWkyDwOF6Egk+vWsXf7l3L787dSrfXbRIu4RUwVtTXc2mpibmlZXxuzt3csWmTbzQ3a2BYBFLg0BErhOR3SLSKiJ3j3F/iYj8JH3/OhGps7KefNI9NMTfHjjA4nXr+OGxY9xXV8ejS5bg1RBQLrHQ7+eNlSv53qJFtMZiXPPWW9SvX89X9+1jS3+/hkIWiVX/mCLiBfYA1wKHgPXALcaYHaOO+SOgwRjzhyJyM/ARY8zHz/a8TU1Nprm52ZKac80YQ38ySXciQefQEHuiUXZGo7zS08OrPT0kjOGqigr+bv58VoZCdperlG3iySSPd3TwUHs7b/T2AlDp83FhKERDIEBdaSlzSkupLSmhwuej3Oul3OejSLduPUVENhhjmsa6z2fh664CWo0xbekiHgduAHaMOuYG4L705X8Hvi0iYixIp0fa2/nGwYPA8AewAUZexIy67dT1kcsZHjtSsjmHx0dTKRKnvVUPUB8I8KVZs/h4TQ2NGgBKUer1cuuMGdw6YwbtAwM8c+IE63p7ae7r48HDhxk4w0dGkcj//Hg8+EZdH922Ht3dKqf9912XxznWSvfW1fFxC/YesTIIaoGDo64fAi4+0zHGmISI9ABTgK7RB4nIHcAdALNnz55QMdVFRSwLBN7xP05G/TLIqNtGXz+XY0/dnz52vMf7vV4qfT4qfT6qiopYWFbGwrIySr3eCb1HpdxgRkkJn5kxg8+kJ00YY+gYGmJ/PE774CA9icSpn/5kkoQxDKV/Tl1OpU493+gIMWPdNipkxjvWapU+az6yrQyCsULy9H+zTI7BGPMw8DAMdw1NpJg11dWsqa6eyEOVUg4mIkwrLmZacbHdpeQtKzvQDgGzRl2fCRw50zEi4gPKgRMW1qSUUuo0VgbBemChiMwVkWLgZmDtacesBT6dvnwj8GsrxgeUUkqdmWVdQ+k+/7uAZwEv8IgxZruI3A80G2PWAt8HHhWRVoZbAjdbVY9SSqmxWTlGgDHmaeDp0267d9TlOHCTlTUopZQ6O51kq5RSLqdBoJRSLqdBoJRSLqdBoJRSLmfZWkNWEZFOYL/ddZxBNaedFZ3nCu39gL6nfFBo7wec8Z7mGGNqxroj74LAyUSk+UyLOuWjQns/oO8pHxTa+wHnvyftGlJKKZfTIFBKKZfTIMiuh+0uIMsK7f2Avqd8UGjvBxz+nnSMQCmlXE5bBEop5XIaBEop5XIaBFkkIjeJyHYRSYmIY6eKZUJErhOR3SLSKiJ3213PZInIIyLSISLb7K4lG0Rkloi8KCI7079zn7e7pskSkVIReVNE3kq/p7+yu6ZsEBGviGwSkafsruVMNAiyaxvwUeBluwuZDBHxAg8C1wNLgVtEZKm9VU3a/wOus7uILEoAf2qMWQKsBj5bAP+PBoD3GmOWA43AdSKy2uaasuHzwE67izgbDYIsMsbsNMbstruOLFgFtBpj2owxg8DjwA021zQpxpiXKaDd74wx7caYjenLfQx/0NTaW9XkmGH96atF6Z+8ns0iIjOBDwL/YnctZ6NBoMZSCxwcdf0Qef4hU8hEpA5YAayzt5LJS3ejbAY6gOeNMfn+nv4B+N9Ayu5CzkaD4ByJyK9EZNsYP3n9jfk0MsZtef3NrFCJSBB4AviCMabX7nomyxiTNMY0MrzH+SoRqbe7pokSkQ8BHcaYDXbXMh5LdygrRMaYa+yuIQcOAbNGXZ8JHLGpFnUGIlLEcAj8yBjzc7vrySZjzEkReYnhcZ18HeC/DFgjIh8ASoGwiPzQGPNJm+t6F20RqLGsBxaKyFwRKWZ4L+m1NtekRhERYXjP753GmG/aXU82iEiNiFSkL5cB1wC77K1q4owxf2GMmWmMqWP4b+jXTgwB0CDIKhH5iIgcAi4B/lNEnrW7pokwxiSAu4BnGR6E/KkxZru9VU2OiDwGvA4sFpFDInKb3TVN0mXA7wHvFZHN6Z8P2F3UJM0AXhSRLQx/GXneGOPYKZeFRJeYUEopl9MWgVJKuZwGgVJKuZwGgVJKuZwGgVJKuZwGgVJKuZwGgVJKuZwGgVJKudz/B4ug/CoCJ80dAAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "np.log(df['Pregnancies'] + 1.0).plot.density(color='c')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## One-hot encoding\n", "\n", "\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(os.path.join(\"diabetes.csv\"))" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcomeAgeCategogy
061487235033.60.627501middle
11856629026.60.351310young
28183640023.30.672321young
318966239428.10.167210young
40137403516843.12.288331young
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome AgeCategogy \n", "0 0.627 50 1 middle \n", "1 0.351 31 0 young \n", "2 0.672 32 1 young \n", "3 0.167 21 0 young \n", "4 2.288 33 1 young " ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['AgeCategogy'] = pd.cut(df['Age'],bins=[0, 35, 55, 120], labels=['young', 'middle', 'old'])\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcomeAgeCategogy_youngAgeCategogy_middleAgeCategogy_old
061487235033.60.627501010
11856629026.60.351310100
28183640023.30.672321100
318966239428.10.167210100
40137403516843.12.288331100
\n", "
" ], "text/plain": [ " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", "0 6 148 72 35 0 33.6 \n", "1 1 85 66 29 0 26.6 \n", "2 8 183 64 0 0 23.3 \n", "3 1 89 66 23 94 28.1 \n", "4 0 137 40 35 168 43.1 \n", "\n", " DiabetesPedigreeFunction Age Outcome AgeCategogy_young \\\n", "0 0.627 50 1 0 \n", "1 0.351 31 0 1 \n", "2 0.672 32 1 1 \n", "3 0.167 21 0 1 \n", "4 2.288 33 1 1 \n", "\n", " AgeCategogy_middle AgeCategogy_old \n", "0 1 0 \n", "1 0 0 \n", "2 0 0 \n", "3 0 0 \n", "4 0 0 " ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.get_dummies(df,columns=['AgeCategogy'])\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Separación de valores\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TeamCityGamesMVP_Player
0EaglesRome12John Stuart
1BearsHelsinki15Leo Da Vinci
2RaptorsHong Kong23Mike Donatello
3HornetsHong Kong18Raphael Dolce
4BeesRome21Bruce Lee
\n", "
" ], "text/plain": [ " Team City Games MVP_Player\n", "0 Eagles Rome 12 John Stuart\n", "1 Bears Helsinki 15 Leo Da Vinci\n", "2 Raptors Hong Kong 23 Mike Donatello\n", "3 Hornets Hong Kong 18 Raphael Dolce\n", "4 Bees Rome 21 Bruce Lee" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame({'Team':['Eagles', 'Bears', 'Raptors', 'Hornets', 'Bees', 'Lions'], \n", " 'City':['Rome', 'Helsinki', 'Hong Kong', 'Hong Kong', 'Rome', 'Rome'],\n", " 'Games':[12, 15, 23, 18, 21, 8],\n", " 'MVP_Player': ['John Stuart', 'Leo Da Vinci', 'Mike Donatello', 'Raphael Dolce', 'Bruce Lee', 'Mahatma Gandhi']})\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "def extract_name(fullname):\n", " return fullname.split(' ')[0]" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TeamCityGamesMVP_PlayerName
0EaglesRome12John StuartJohn
1BearsHelsinki15Leo Da VinciLeo
2RaptorsHong Kong23Mike DonatelloMike
3HornetsHong Kong18Raphael DolceRaphael
4BeesRome21Bruce LeeBruce
\n", "
" ], "text/plain": [ " Team City Games MVP_Player Name\n", "0 Eagles Rome 12 John Stuart John\n", "1 Bears Helsinki 15 Leo Da Vinci Leo\n", "2 Raptors Hong Kong 23 Mike Donatello Mike\n", "3 Hornets Hong Kong 18 Raphael Dolce Raphael\n", "4 Bees Rome 21 Bruce Lee Bruce" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df['Name'] = df['MVP_Player'].apply(lambda fullname: fullname.split(' ')[0])\n", "df['Name'] = df.apply(lambda row: row['MVP_Player'].split(' ')[0], axis = 1 )\n", "df['Name'] = df['MVP_Player'].apply(extract_name)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ajuste de escala\n", "\n", "\n", "\n", "El ajuste de escala es una transformación aplicada a variables numéricas que tiene como objetivo asegurar que los valores de diferentes variables estén en el mismo rango. Esta transformación es necesaria cuando se emplean algoritmos sensibles a las magnitudes de las variables.\n", "\n", "El método de ajuste más utilizado se basa en el cálculo del valor z (puntuación estándar, z-score); genera valores centrados en cero y con una desviación estándard igual a 1.\n", "\n", "El valor Z mide las desviaciones estándar de distancia entre un valor y la media.\n", "\n", "__[Boston house prices dataset](https://scikit-learn.org/stable/datasets/index.html#boston-dataset)__" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_boston\n", "from sklearn.preprocessing import StandardScaler" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTAT
00.0063218.02.310.00.5386.57565.24.09001.0296.015.3396.904.98
10.027310.07.070.00.4696.42178.94.96712.0242.017.8396.909.14
20.027290.07.070.00.4697.18561.14.96712.0242.017.8392.834.03
30.032370.02.180.00.4586.99845.86.06223.0222.018.7394.632.94
40.069050.02.180.00.4587.14754.26.06223.0222.018.7396.905.33
\n", "
" ], "text/plain": [ " CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \\\n", "0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 \n", "1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 \n", "2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 \n", "3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 \n", "4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 \n", "\n", " PTRATIO B LSTAT \n", "0 15.3 396.90 4.98 \n", "1 17.8 396.90 9.14 \n", "2 17.8 392.83 4.03 \n", "3 18.7 394.63 2.94 \n", "4 18.7 396.90 5.33 " ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "boston_dataset = load_boston()\n", "df = pd.DataFrame(boston_dataset.data, columns=boston_dataset.feature_names)\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "StandardScaler(copy=True, with_mean=True, with_std=True)" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaler = StandardScaler()\n", "scaler.fit(df)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.41978194, 0.28482986, -1.2879095 , ..., -1.45900038,\n", " 0.44105193, -1.0755623 ],\n", " [-0.41733926, -0.48772236, -0.59338101, ..., -0.30309415,\n", " 0.44105193, -0.49243937],\n", " [-0.41734159, -0.48772236, -0.59338101, ..., -0.30309415,\n", " 0.39642699, -1.2087274 ],\n", " ...,\n", " [-0.41344658, -0.48772236, 0.11573841, ..., 1.17646583,\n", " 0.44105193, -0.98304761],\n", " [-0.40776407, -0.48772236, 0.11573841, ..., 1.17646583,\n", " 0.4032249 , -0.86530163],\n", " [-0.41500016, -0.48772236, 0.11573841, ..., 1.17646583,\n", " 0.44105193, -0.66905833]])" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "array = scaler.transform(df)\n", "array" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTAT
0-0.4197820.284830-1.287909-0.272599-0.1442170.413672-0.1200130.140214-0.982843-0.666608-1.4590000.441052-1.075562
1-0.417339-0.487722-0.593381-0.272599-0.7402620.1942740.3671660.557160-0.867883-0.987329-0.3030940.441052-0.492439
2-0.417342-0.487722-0.593381-0.272599-0.7402621.282714-0.2658120.557160-0.867883-0.987329-0.3030940.396427-1.208727
3-0.416750-0.487722-1.306878-0.272599-0.8352841.016303-0.8098891.077737-0.752922-1.1061150.1130320.416163-1.361517
4-0.412482-0.487722-1.306878-0.272599-0.8352841.228577-0.5111801.077737-0.752922-1.1061150.1130320.441052-1.026501
\n", "
" ], "text/plain": [ " CRIM ZN INDUS CHAS NOX RM AGE \\\n", "0 -0.419782 0.284830 -1.287909 -0.272599 -0.144217 0.413672 -0.120013 \n", "1 -0.417339 -0.487722 -0.593381 -0.272599 -0.740262 0.194274 0.367166 \n", "2 -0.417342 -0.487722 -0.593381 -0.272599 -0.740262 1.282714 -0.265812 \n", "3 -0.416750 -0.487722 -1.306878 -0.272599 -0.835284 1.016303 -0.809889 \n", "4 -0.412482 -0.487722 -1.306878 -0.272599 -0.835284 1.228577 -0.511180 \n", "\n", " DIS RAD TAX PTRATIO B LSTAT \n", "0 0.140214 -0.982843 -0.666608 -1.459000 0.441052 -1.075562 \n", "1 0.557160 -0.867883 -0.987329 -0.303094 0.441052 -0.492439 \n", "2 0.557160 -0.867883 -0.987329 -0.303094 0.396427 -1.208727 \n", "3 1.077737 -0.752922 -1.106115 0.113032 0.416163 -1.361517 \n", "4 1.077737 -0.752922 -1.106115 0.113032 0.441052 -1.026501 " ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_scaled = pd.DataFrame(array, columns=df.columns)\n", "df_scaled.head()" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-1.18321596],\n", " [-0.50709255],\n", " [ 0.16903085],\n", " [ 1.52127766]])" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Revisión con menos datos\n", "data = [[-1]\n", " , [-0.5]\n", " , [0]\n", " , [1]\n", " ]\n", "scaler.fit(data)\n", "scaler.transform(data)" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "mean_a = np.array([-1,-0.5, 0, 1]).mean()\n", "std_a = np.array([-1,-0.5, 0, 1]).std()" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-0.125\n", "0.739509972887452\n" ] } ], "source": [ "print(mean_a)\n", "print(std_a)" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-1.1832159566199232" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(-1 - mean_a) / std_a" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.52127765851133" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(data[3][0] - mean_a) / std_a" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 2 }